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INTRODUCTION 

This is an application note on using numerics in IntePs 
iAPX 86 or iAPX 88 microprocessor family. The nu- 
merics implemented in the family provide instruction 
level support for high-precision integer and floating 
point data types with arithmetic operations like add, 
subtract, multiply, divide, square root, power, log and 
trigonometries. These features are provided by members 
of the iAPX 86 or iAPX 88 family called numeric data 
processors. 

Rather than concentrate on a narrow, specific applica- 
tion, the topics covered in this application note were 
chosen for generality across many applications. The 
goal is to provide sufficient background information so 
that software and hardware engineers can quickly move 
beyond needs specific to the numeric data processor and 
concentrate on the special needs of their application. 
The material is structured to allow quick identification 
of relevant material without reading all the material 
leading up to that point. Everyone should read the in- 
troduction to establish terminology and a basic 
background. 

iAPX 86,88 BASE 

The numeric data processor is based on an 8088 or 8086 
microprocessor. The 8086 and 8088 are general purpose 
microprocessors, designed for general data processing 
applications. General applications need fast, efficient 
data movement and program control instructions. Ac- 
tual arithmetic on data values is simple in general appli- 
cations. The 8086 and 8088 fulfill these needs in a low 
cost, effective manner. 

However, some applications need more powerful arith- 
metic instructions and data types than a general purpose 
data processor provides. The real world deals in frac- 
tional values and requires arithmetic operations like 
square root, sine, and logarithms. Integer data types 
and their operations like add, subtract, multiply, and 
divide may not meet the needs for accuracy, speed, and 
ease of use. 

Such functions are not simple or inexpensive. The 
general data processor does not provide these features 
due to their cost to other less-complex applications that 
do not need such features. A special processor is re- 
quired, one which is easy to use and has a high level of 
support in hardware and software. 

The numeric data processor provides these features. It 
supports the data types and operations needed and 
allows use of all the current hardware and software sup- 
port for the iAPX 86/10 and 88/10 microprocessors. 

The iAPX 86 and iAPX 88 provide two imple- 
mentations of a numeric data processor. Each offers 
different tradeoffs in performance, memory size, and 
cost. 


One alternative uses a special hardware component, the 

8087 numeric processor extension, while the other is 
based on software, the 8087 emulator. Both component 
and software emulator add the extra numerics data 
types and operations to the 8086 or 8088. 

The component and its software emulator are com- 
pletely compatible. 

Nomenclature 

Table one shows several possible configurations 
of the iAPX 86 and iAPX 88 microprocessor family. 
The choice of configuration will be decided by the 
needs of the application for cost and performance 
in the areas of general data processing, numerics, 
and I/O processing. The combination of an 8086 or 

8088 with an 8087 is called an iAPX 86/20 or 88/20 
numeric data processor. For applications requir- 
ing high I/O bandwidths and numeric perfor- 
mance, a combination of 8086, 8087 and 8089 is an 
iAPX 86/21 numerics and I/O data processor. The 
same system with an 8088 CPU for smaller size 
and lower cost, due to the smaller 8-bit wide 
system data bus, is referred to as an iAPX 88/21. 
Each 8089 in the system is designated in the units 
digit of the system designation. The term 86/2X or 
88/2X refers to a numeric data processor with any 
number of 8089s. 

Throughout this application note, I will use the 
terms NDP, numeric data processor, 86/2X, and 
88/2X synonymously. Numeric processor exten- 
sion and NPX are also synonymous for the func- 
tions of either the 8087 component or 8087 
emulator. The term numeric instruction or 
numeric data type refers to an instruction or data 
type made available by the NPX. The term host will 
refer to either the 8086 or 8088 microprocessor. 


Table 1. Components Used in i/APX 86,88 
Configurations 


System Name 

8086 8087 8088 8089 

iAPX 86/10 

1 

iAPX 86/11 

1 1 

iAPX 86/12 

1 2 

iAPX 86/20 

1 1 

iAPX 86/21 

1 1 1 

iAPX 86/22 

1 1 2 

iAPX 88/10 

1 

1 

iAPX 88/11 

1 1 

iAPX 88/12 

1 2 

iAPX 88/20 

1 1 

iAPX 88/21 

1 1 1 

iAPX 88/22 

1 1 2 
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NPX OVERVIEW 

The 8087 is a coprocessor extension available to 
iAPX 86/ IX or iAPX 88/ IX maximum mode 
microprocessor systems. (See page 7). The 8087 
adds hardware support for floating point and ex- 
tended precision integer data types, registers, and 
instructions. Figure 1 shows the register set 
available to the NDP. On the next page, the seven 
data types available to numeric instructions are 
listed (Fig 2). Each data type has a load and store 
instruction. Independent of whether an 8087 or its 
emulator are used, the registers and data types all 
appear the same to the programmer. 

All the numeric instructions and data types of the NPX 
are used by the programmer in the same manner as the 
general data types and instructions of the host. 

The numeric data formats and arithmetic operations 
provided by the 8087 conform to the proposed IEEE 
Microprocessor Floating Point Standard. All the pro- 
posed IEEE floating point standard algorithms, excep- 
tion detection, exception handling, infinity arithmetic 
and rounding controls are implemented.^ 

The numeric registers of the NPX are provided for fast, 
easy reference to values needed in numeric calculations. 
All numeric values kept in the NPX register file are held 

AXA. 4-iki.W WO WJit Jjf j. Wi. AlxiUir JIAaWA 

is the same as the 80-bit temporary real data type. 

All data types are converted to the 80-bit register file 
format when used by the NPX. Load and store instruc- 
tions automatically convert between the memory 
operand data type and the register Hie format for all 
numeric data types. The numeric load instruction 
specifies the format in which the memory operand is ex- 
pected and which addressing mode to use. 

All host base registers, index registers, segment 
registers, and addressing modes are available for 
locating numeric operands. In the same manner, the 
store instruction also specifies which data type to use 
and where the value is located when stored into 
memory. 

Selecting Numeric Data Types 

As figure 2 shows, the numeric data types are of dif- 
ferent lengths and domains (real or integer). Each 
numeric data type is provided for a specific function, 
they are: 

16-bit word integers — Index values, loop counts, 

and small program control 
values 


’“An Implementation Guide to a Proposed Standard for Floating 
Point” by Jerome Coonen in Computer, Jan. 1980 or the Oct. 1979 
issue of ACM SIGNUM, for more information on the standard. 


32-bit short integers 

64-bit long integers 

18-digit packed 
decimal 


— Large integer general 
computation 

— Extended range integer 
computation 

— Commercial and decimal 
conversion arithmetic 


32-bit short real — Reduced range and 

accuracy is traded for 
reduced memory require- 
ments 


64-bit long real 

80-bit temporary 
real 


—Recommended floating 
point variable type 

— Format for intermediate 
or high precision calcu- 
lations 


Referencing memory data types in the NDP is not 
restricted to load and store instructions. Some arith- 
metic operations can specify a memory operand in one 
of four possible data types. The numeric instructions 
compare, add, subtract, subtract reversed, multiply, 
divide, and divide reversed can specify a memory 
operand to be either a 16-bit integer, 32-bit integer, 
32-bit real, or 64-bit real value. As with the load and 
store operations, the arithmetic instruction specifies the 
address and expected format of the memory operand. 


The remaining arithmetic operations: square root, 
modulus, tangent, arctangent, logarithm, exponentiate, 
scale power, and extract power use only register 
operands. 


15 FILE 0 79 NPX STACK 0 




Figure 1. NDP Register Set for iAPX 86/20, 88/20 
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The register set of the host and 8087 are in separate 
components. Direct transfer of values between the two 
register sets in one instruction is not possible. To trans- 
fer values between the host and numeric register sets, 
the value must first pass through memory. The memory 
format of a 16-bit short integer used by the NPX is iden- 
tical to that of the host, ensuring fast, easy transfers. 

Since an 8086 or 8088 does not provide single instruc- 
tion support for the remaining numeric data types, host 
programs reading or writing these data types must con- 
form to the bit and byte ordering established by the 
NPX. 

Writing programs using numeric instructions is as sim- 
ple as with the host’s instructions. The numeric instruc- 
tions are simply placed in line with the host’s instruc- 
tions. They are executed in the same order as they ap- 
pear in the instruction stream. Numeric instructions 
follow the same form as the host instructions. Figure 2 
shows the ASM 86/88 representations for different 
numeric instructions and their similarity to host instruc- 
tions. 

FILD VALUE 

FIADD TABLE [BX] 

FADD ST,ST(1) 


8087 EMULATOR OVERVIEW 

The NDP has two basic implementations, an 8087 com- 
ponent or with its software emulator (E8087). The deci- 
sion to use the emulator or component has no effect on 
programs at the source level. At the source level, all in- 
structions, data types, and features are used the same 
way. 

The emulator requires all numeric instruction opcodes 
to be replaced with an interrupt instruction. This 
replacement is performed by the LINK86 program. In- 
terrupt vectors in the host’s interrupt vector table will 
point to numeric instruction emulation routines in the 
8087 software emulator. 

When using the 8087 emulator, the linker changes all the 
2-byte wait-escape, nop-escape, wait-segment override, 
or nop-segment override sequences generated by an 
assembler or compiler for the 8087 component with a 
2-byte interrupt instruction. Any remaining bytes of the 
numeric instruction are left unchanged. 
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When the host encounters numeric and emulated in- 
struction, it will execute the software interrupt instruc- 
tion formed by the linker. The interrupt vector table will 
direct the host to the proper entry point in the 8087 
emulator. Using the interrupt return address and CPU 
register set, the host will decode any remaining part of 
the numeric instruction, perform the indicated opera- 
tion, then return to the next instruction following the 
emulated numeric instruction. 

One copy of the 8087 emulator can be shared by all pro- 
grams in the host. 

The decision to use the 8087 or software emulator is 
made at link time, when all software modules are 
brought together. Depending on whether an 8087 or its 
software emulator is used, a different group of library 
modules are included for linking with the program. 

If the 8087 component is used, the libraries do not add 
any code to the program, they just satisfy external refer- 
ences made by the assembler or compiler. Using the 
emulator will not increase the size of individual modu- 
les; however, other modules requiring about 16K bytes 
that implement the emulator will be automatically 
added. 

Selecting between the emulator or the 8087 can be very 
easy. Different versions of submit files performing the 

V%ck iicorf cir>or*if‘tr fVwa rvF 

library modules needed. Figure 3 shows an example of 
two different submit files for the same program using 
the NPX with an 8087 or the 8087 emulator. 

iSBC 337^« MULTIMODULE^” Overview 

The benefits of the NPX are not limited to systems 
which left board space for the 8087 component or mem- 
ory space for its software emulator. Any maximum 
mode iAPX 86/ IX or LAPX 88/ IX system can be up- 
graded to a numeric processor. The iSBC 337 MUL- 
TIMODULE is designed for just this function. The 
iSBC 337 provides a socket for the host microprocessor 
and an 8087. A 40-pin plug is provided on the underside 
of the 337 to plug into the original host’s socket, as 
shown in Figure 4. Two other pins on the underside of 
the MULTIMODULE allow easy connection to the 
8087 INT and RQ/GTl pins. 


8087 BASED LINK/LOCATE COMMANDS 

LINK86 

:F1:PROG.OBJ, lO.LIB, 8087.L1B TO 
:F1:PROG.LNK 

LOC86 

:F1:PROG.LNK TO :F1:PROG 

SOFTWARE EMULATOR BASED 


LINK/LOCATE COMMANDS 

LINK86 

;F1:PROG.OBJ, lO.LIB, E8087.LIB, 
E8087 TO :F1:PROG.LNK 

LOC86 

:F1:PROG.LNK TO ;F1:PROG 


Figure 3. Submit File Example 
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Figure 4. MULTIMODULE Math Mounting Scheme 
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CONSTRUCTING AN iAPX 86/2X OR lAPX 
88/2X SYSTEM 

This section will describe how to design a micropro- 
cessor system with the 8087 component. The discussion 
will center around hardware issues. However, some of 
the hardware decisions must be made based upon how 
the software will use the NPX. To better understand 
how the 8087 operates as a local bus master, we shall 
cover how the coprocessor interface works later in this 
section. 


Wiring up the 8087 

The 8087 can be designed into any 86/ IX or 88/ IX 
system operating in maximum mode. Such a system 
would be designated an 86/2X or 88/2X. Figure 5 shows 
the local bus interconnections for an iAPX 86/20 (or 
iAPX 88/20) system. The 8087 shares the maximum 
mode host’s multiplexed address/data bus, status sig- 
nals, queue status signals, ready status signal, clock and 
reset signal. Two dedicated signals, BUSY and INT, in- 
form the host of current 8087 status. The lOK pull-down 
resistor on the BUSY signal ensures the host will always 
see a “not busy” status if an 8087 is not installed. 


Adding the 8087 to your design has a minor effect on 
hardware timing. The 8087 has the exact same timing 
and equivalent DC and AC drive characteristics as a 
host or lOP on the local bus. All the local bus logic, 
such as clock; ready, and interface logic is shared. 

The 8087 adds 15 pF to the total capacitive loading on 
the shared address/data and status signals. Like the 
8086 or 8088, the 8087 can drive a total of 100 pF 
capacitive load above its own self load and sink 2.0 mA 
DC current on these pins. This AC and DC drive is suf- 
ficient for an 86/21 system with two sets of data 
transceivers, address latches, and bus controllers for 
two separate busses, an on-board bus and an off-bo£ird 
MULTIBUS™ using the 8289 bus arbiter. 

Later in this section, what to do with the 8087 INT and 
RQ/GT pins, is covered. 


It is possible to leave a prewired 40-pin socket on the 
board for the 8087. Adding the 8087 to such a system is 
as easy as just plugging it in. If a program attempts to 
execute any numeric instructions without the 8087 in- 


Liicj' will uc siiiipij' iicciL&u i'^\_/i iii^ii uv.liGiiS 

by the host. Software can test for the existence of the 
8087 by initializing it and then storing the control word. 
The program of Figure 6 illustrates this technique. 
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Figure 5. System Diagram 



AM 13 


WHAT ISTHEiAPX 86, 88 
COPROCESSOR INTERFACE? 

The idea of a coprocessor is based on the observation 
that hardware specially designed for a function is the 
fastest, smallest, and cheapest implementation. But, it is 
too expensive to incorporate all desired functions in 
general purpose hardware. Few applications could use 
all the functions. To build fast, small, economical sys- 
tems, we need some way to mix and match components 
supporting specialized functions. 

Purpose of the Coprocessor Interface 

The coprocessor interface of the general purpose 8086 
or 8088 microprocessor provides a way to attach special- 
ized hardware in a simple, elegant, and efficient man- 
ner. Because the coprocessor heu-dweu-e is specialized, it 
can perform its job much faster than any general pur- 
pose CPU of similar size and cost. The coprocessor 
interface simply requires connection to the host’s local 
address/data, status, clock, ready, reset, test and re- 
quest/grant signals. Being attached to the host’s local 
bus gives the coprocessor access to all memory emd I/O 
resources available to the host. 

The coprocessor is independent of system configura- 
tion. Using the local bus as the connection point to the 
host isolates the coprocessor from the particular system 
confieiiratinn. since the timing anH fnnrtion of Inr'nl hn« 

signals are fbced. 

Software’s View of the Coprocessor 

The coprocessor interface allows specialized hardware 
to appear as an integral part of the host’s architecture 
controlled by the host with special instructions. When 
the host encounters these special instructions, both the 
host and coprocessor recognize them and work together 
to perform the desired function. No status polling loops 
or command stuffing sequences are required by soft- 
ware to operate the coprocessor. 

More information is available to a coprocessor than 
simply an instruction opcode and a signal to begin exe- 


cution. The host’s coprocessor interface can read a 
value from memory, or identify a region of memory the 
coprocessor should use while performing its function. 
All the addressing modes of the host are available to 
identify memory based operands to the coprocessor. 

Concurrent Execution of Host and 
Coprocessor 

After the coprocessor has started its operation, the host 
may continue on with the program, executing it in par- 
allel while the coprocessor performs the function started 
earlier. The parallel operation of the coprocessor does 
not normally affect that of the host unless the copro- 
cessor must reference memory or I/O-based operands. 
When the host releases the local bus to the coprocessor, 
the host may continue to execute from its internal in- 
struction queue. However, the host must stop when it 
also needs the local bus currently in use by the copro- 
cessor. Except for the stolen memory cycle, the opera- 
tion of the coprocessor is transparent to the host. 

This parallel operation of host and coprocessor is called 
concurrent execution. Concurrent execution of instruc- 
tions requires less total time then a strictly sequential 
execution would. System performance will be higher 
with concurrent execution of instructions between the 
host and coprocessor. 

SYNCHRONIZATION 

In exchange for the higher system performance made 
available by concurrent execution, programs must pro- 
vide what is called synchronization between the host 
and coprocessor. Synchronization is necessary whenever 
the host and coprocessor must use information available 
from the other. Synchronization involves either the host 
or coprocessor waiting for the other to finish an opera- 
tion currently in progress. Since the host executes the 
program, and has program control instructions like 
jumps, it is given responsibility for synchronization. To 
meet this need, a special host instruction exists to syn- 
chronize host operation with a coprocessor. 


Test for the existence of an 8087 in the system. This code wiil aiways recognize an 8087 
independent of the TEST pin usage on the host. No deadlock is possible. Using the 8087 
emulator will not change the function of this code since ESC instructions are used. The word 
variable control is used for communication between the 8087 and the host. Note: if an 8087 is 
present, it will be initialized. Register ax is not transparent across this code. 


ESC 

28, bx 

FNINIT if 8087 is present . The contents of bx is irrelevant 

XOR 

ax, ax 

These two instructions insert delay while the 8087 initializes itself 

MOV 

control, ax 

Clear intial control word value 

ESC 

15, control 

FNSTCW if 8087 is present 

OR 

ax, control 

Control = OSffh if 8087 present 

JZ 

no_8087 

Jump if no 8087 Is present 


Figure 6. Test for Existence of an 8087 
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The host coprocessor synchronization instruction, 
called “WAIT”, uses the TEST pin of the host. The 
coprocessor can signal that it is still busy to the host via 
this pin. Whenever the host executes a wait instruction, 
it will stop program execution while the TEST input is 


!!V^- -- llWAl I,1AW X X |^11A 1^ WVL./XJlXWk3 lllCiVtlW) tllW 
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will resume program execution with the next instruction 
following the WAIT. While waiting on the TEST pin, 
the host can be interrupted at 5 clock intervals; how- 
ever, after the TEST pin becomes inactive, the host will 
immediately execute the next instruction, ignoring any 
pending interrupts between the WAIT and following 
instruction. 


COPROCESSOR CONTROL 

The host has the responsibility for overall program con- 
trol. Coprocessor operation is initiated by special in- 
structions encountered by the host. These instructions 
are called “ESCAPE” instructions. When the host en- 
counters an ESCAPE instruction, the coprocessor is 
expected to perform the action indicated by the instruc- 
tion. There are 576 different ESCAPE instructions, 
allowing the coprocessor to perform many different 
actions. 

The host’s coprocessor interface requires the copro- 
cessor to recognize when the host has encountered an 
ESCAPE instruction. Whenever the host begins execut- 
ing a new instruction, the coprocessor must look to see 
if it is an ESCAPE instruction. Since only the host 
fetches instructions and executes them, the coprocessor 
must monitor the instructions being executed by the 
host. 

Host Queue Tracking 

The host can fetch an instruction at a variable length 
time before the host executes the instruction. This is a 
characteristic of the instruction queue of an 8086 or 
8088 microprocessor. An instruction queue allows pre- 
fetching instructions during times when the local bus 


would be otherwise idle. The end benefit is faster execu- 
tion time of host instructions for a given memory band- 
width. 

The host does not externally indicate which instruction 
it is currently executing. Instead, the host indicates 
when it fetches an instruction and when, some time 
later, an opcode byte is decoded and executed. To iden- 
tify the actual instruction the host fetched from its 
queue, the coprocessor must also maintain an instruc- 
tion stream identical to the host’s. 

Instructions can be fetched in byte or word increments, 
depending on the type of host and the destination ad- 
dress of jump instructions executed by the host. When 
the host has filled its queue, it stops prefetching instruc- 
tions. Instructions are removed from the queue a byte at 
a time for decoding and execution. When a jump oc- 
curs, the queue is emptied. The coprocessor follows 
these actions in the host by monitoring the host’s bus 
status, queue status, and data bus signals. Figure 7 
shows how the bus status signals and queue status 
signals are encoded. 

IGNORING I/O PROCESSORS 

The host is not the only local bus master capable of 
fetching instructions. An Intel 8089 lOP can generate 
instruction fetches on the local bus in the course of exe- 
cuting a channel program in system memory. In this 
case, the status signals S2, SI, and SO generated by the 
lOP are identical to those of the host. The coprocessor 
must not interpret these instruction prefetches as going 
to the host’s instruction queue. This problem is solved 
with a status signal called S6. The S6 signal identifies 
when the local bus is being used by the host. When the 
host is the local bus master, S6 = 0 during T2 and T3 of 
the memory cycle. All other bus masters must set S6 = 1 
during T2 and T3 of their instruction prefetch cycles. 
Any coprocessor must ignore activity on the local bus 
when S6 = 1 . 
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DECODING ESCAPE INSTRUCTIONS 

To recognize ESCAPE instructions, the coprocessor 
must examine all instructions executed by the host. 
When the host fetches an instruction byte from its inter- 
nal queue, the coprocessor must do likewise. 

The queue status state, fetch opcode byte, identifies 
when an opcode byte is being examined by the host. At 
the same time, the coprocessor will check if the byte fet- 
ched from its internal instruction queue is an ESCAPE 
opcode. If the instruction is not an ESCAPE, the 
coprocessor will ignore it. The queue status signals for 
fetch subsequent byte and flush queue let the 
coprocessor track the host’s queue without knowledge 
of the length and function of host instructions and ad- 
dressing modes. 

Escape Instruction Encoding 

All ESCAPE instructions start with the high-order 
5-bits of the instruction being 11011. They have two 
basic forms. The non-memory form, listed here, in- 
itiates some activity in the coprocessor using the nine 
available bits of the ESCAPE instruction to indicate 
which function to perform. 

MOD 

M M °MMI I I MMI I I I I I I 
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Memory reference forms of the ESCAPE instruction, 
shown in Figure 8, allow the host to point out a memory 
operand to the coprocessor using any host memory ad- 
dressing mode. Six bits are available in the memory 
reference form to identify what to do with the memory 
operand. Of course, the coprocessor may not recognize 
all possible ESCAPE instructions, in which case it will 
simply ignore them. 

Memory reference forms of ESCAPE instructions are 
identified by bits 7 and 6 of the byte following the 
ESCAPE opcode. These two bits are the MOD field of 
the 8086 or 8088 effective address calculation byte. 


They, together with the R/M field, bits 2 through 0, 
determine the addressing mode and how many subse- 
quent bytes remain in the instruction. 

Host’s Response to an Escape Instruction 

The host performs one of two possible actions when 
encountering an ESCAPE instruction: do nothing or 
calculate an effective address and read a word value 
beginning at that address. The host ignores the value of 
the word read. ESCAPE instructions change no regis- 
ters in the host other than advancing IP. So, if there is 
no coprocessor, or the coprocessor ignores the ESCAPE 
instruction, the ESCAPE instruction is effectively a 
NOP to the host. Other than calculating a memory ad- 
dress and reading a word of memory, the host makes no 
other assumptions regarding coprocessor activity. 

The memory reference ESCAPE instructions have two 
purposes: identify a memory operand and for certain in- 
structions, transfer a word from memory to the 
coprocessor. 

COPROCESSOR INTERFACE TO MEMORY 

The design of a coprocessor is considerably simplified if 
it only requires reading memory values of 16 bits or less. 
The host can perform all the reads with the coprocessor 

latrhina thp vnliie it annpars r*n thf* data hii? at thp 

end of T3 during the memory read cycle. The copro- 
cessor need never become a local bus master to read or 
write additional information. 

If the coprocessor must write information to memory, 
or deal with data values longer than one word, then it 
must save the memory address and be able to become a 
local bus master. The read operation performed by the 
host in the course of executing the ESCAPE instruction 
places the 20-bit physical address of the operand on the 
address/data pins during T1 of the memory cycle. At 
this time the coprocessor can latch the address. If the 
coprocessor instruction also requires reading a value, it 
will appear on the data bus during T3 of the memory 
read. All other memory bytes are addressed relative to 
this starting physical address. 


1-^ 

JU 

8 

Lll 

JLi 

_i 

__1 


QD 

u 


LJ 

LL 

R/M 

m 


16-bit direct displacement 

L L J_J_ 1 1 1 1 1 1 1 1 1 1 1 1 

* '15 

'l 4 

'l 3 

*12 

*11 

*10 

*9 

*8 

*7 *6 

*5 

'4 

'3 

*2 

'i 

'o 'Di5 

Di 4 *^13 *^12 *^11 Dio Dg 

Dg * D/ Dg Dg D4 D3 D2 D-j Dq 









MOD 





R/M 


16-bit displacement 

IJLi 

JU 

0 

LLi 

_U 


_J 


IID 

L_ 

L_ 

1_J 



1 

1 1 1 1 1 1 

I I I I I I I 

'15 

'l 4 

'l 3 

*12 

*11 

*10 

*9 

*8 

*7 *6 

*5 

*4 

*3 

*2 

•i 

'o *®15 

°14 Di3 Di2 Dii D-io Dg 

^8 D7 Dg Dg Dg Dg D-| Dg ' 









MOD 





R/M 


8-bit displacement 




0 

' ! 


1 



uM 

L_ 


LJ 

l_ 

J_J. 

1 

■■■■■■ 

lJ 

‘'15 

'14 

*13 

*12 

*11 

*10 

*9 

*8 

*7 *6 

*5 

*4 

*3 

*2 


*0 ' ®7 


Do 









MOD 





R/M 


■■■I 


Ijj 


D 

1 I 

' 1 

1 



□D 





J L 

_| 



•'15 

'14 

*13 

*12 

*11 

mol 

*9 

*8 

*7 *6 

'5 

*4 

*3 

*2 

'i 
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Whether the coprocessor becomes a bus master or not, 
if the coprocessor has memory reference instruction 
forms, it must be able to identify the memory read per- 
formed by the host in the course of executing an 
ESCAPE instruction. 

Identifying the memory read is straightforward, requir- 
ing all the following conditions to be met: 

1) A MOD value of 00, 01, or 10 in the second byte 
of the ESCAPE instruction executed by the host, 

2) This is the first data read memory cycle performed 
by the host after it encountered the ESCAPE in- 
struction. In particular, the bus status signals 
S2-S0 will be 101 and S6 will be 0. 

The coprocessor must continue to track the instruction 
queue of the host while it calculates the memory address 
and reads the memory value. This is simply a matter of 
following the fetch subsequent byte status commands 
occurring on the queue status pins. 

HOST PROCESSOR DIFFERENCES 

A coprocessor must be aware of the bus characteristics 
of the host processor. This determines how the host will 
read the word operand of a memory reference ESCAPE 
instruction. If the host is an 8088, it will always perform 
two byte reads at sequential addresses. But if the host is 
an 8086, it can either perform a single word read or two 
byte reads to sequential addresses. 

The 8086 places no restrictions on the alignment of 
word operands in memory. It will automatically per- 
form two byte operations for word operands starting at 
an odd address. The two operations are necessary since 
the two bytes of the operand exist in two different mem- 
ory words. The coprocessor should be able to accept the 
two possible methods of reading a word value on the 
8086. 

A coprocessor can determine whether the 8086 will per- 
form one or two memory cycles as part of the current 
ESCAPE instruction execution. The ADO pin during T1 
of the first memory read by the host tells if this is the 
only read to be performed as part of the ESCAPE in- 
struction. If this pin is a 1 during T1 of the memory 
cycle, the 8086 will immediately follow this memory 
read cycle with another one at the next byte address. 

Coprocessor Interface Summary 

The host ESCAPE instructions, coprocessor interface, 
and WAIT instruction allow eeisy extension of the host’s 
architecture with specialized processors. The 8087 is 
such a processor, extending the host’s architecture as 
seen by the programmer. The specialized hardware pro- 
vided by the 8087 can greatly improve system perfor- 
mance economically in terms of both hardware and 
software for numerics applications. 


The next section examines how the 8087 uses the 
coprocessor interface of the 8086 or 8088. 

8087 COPROCESSOR OPERATION 

The 8086 or 8088 ESCAPE instructions provide 64 
memory reference opcodes and 512 non-memory refer- 
ence opcodes. The 8087 uses 57 of the memory reference 
forms and 406 of the non-memory reference forms. Fig- 
ure 9 shows the ESCAPE instructions not used by the 
8087. 
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Using the 8087 With Custom 
Coprocessors 

Custom coprocessors, a designer may care to develop, 
should limit their use of ESCAPE instructions to those 
not used by the 8087 to prevent ambiguity about 
whether any one ESCAPE instruction is intended for a 
numerics or other custom coprocessor. Using any 
escape instruction for a custom coprocessor may con- 
flict with opcodes chosen for future Intel coprocessors. 

Operation of an 8087 together with other custom co- 
processors is possible under the following constraints: 

1) All 8087 errors are masked. The 8087 will update its 
opcode and instruction address registers for the un- 
used opcodes. Unused memory reference instruc- 
tions will also update the operand address value. 
Such changes in the 8087 make software-defined 
error handling impossible. 

2) If the coprocessors provide a BUSY signal, they must 
be ORed together for connection to the host TEST 
pin. When the host executes a WAIT instruction, it 
does not know which coprocessor will be affected by 
the following ESCAPE instruction. In general, all 
coprocessors must be idle before executing the 
ESCAPE instruction. 


Operand Addressing by the 8087 

The 8087 has seven different memory operand formats. 
Six of them are longer than one word. All are an even 
number of bytes in length and are addressed by the host 
at the lowest address word. 

When the host executes a memory reference ESCAPE 
instruction intended to cause a read operation by the 
8087, the host always reads the low-order word of any 
8087 memory operand. The 8087 will save the address 
and data read. To read any subsequent words of the 
operand, the 8087 must become a local bus master. 

When the 8087 has the local bus, it increments the 20-bit 
physical address it saved to address the remaining words 
of the operand. 

When the ESCAPE instruction is intended to cause a 
write operation by the 8087, the 8087 will save the ad- 
dress but ignore the data read. Eventually, it will get 
control of the local bus, then perform successive write, 
increment address operations writing the entire data 
value. 


8087 OPERATION IN lAPX 86,88 SYSTEMS 

The 8087 will work with either an 8086 or 8088 host. 
The identity of the host determines the width of the 
local bus path. The 8087 will identify the host and 
adjust its use of the data bus accordingly; 8 bits for an 

8088 or 16 bits for an 8086. No strapping options are 
required by the 8087; host identification is automatic. 

The 8087 identifies the host each time the host and 8087 
are reset via the RESET pin. After the reset signal goes 
inactive, the host will begin instruction execution at 
memory address FFFFO 16 . 

If the host is an 8086 it will perform a word read at that 
address; an 8088 will perform a byte read. 

The 8087 monitors pin 34 on the first memory cycle 
after power up. If an 8086 host is used, pin 34 will be the 
BHE signal, which will be low for that memory cycle. 
For an 8088 host, pin 34 will be the SSO signal, which 
will be high during T1 of the first memory cycle. Based 
on this signal, the 8087 will then configure its data bus 
width to match that of the host local bus. 

For 88/2X systems, pin 34 of the 8087 may be tied to 
Vcc if not connected to the 8088 SSO pin. 

The width of the data bus and alignment of data oper- 
ands has no effect, except for execution time and num- 

* t r - - - J n/\o^ 

1/^1. V#JL j Wjr .a. Wa. w ^ ^ m . - 

A numeric program will always produce the same results 
on an 86/2X or 88/2X with any operand alignment. All 
numeric operands have the same relative byte orderings 
independent of the host and starting address. 

The byte alignment of memory operands can affect the 
performance of programs executing on an 86/2X. If a 
word operand, or any numeric operand, starts on an 
odd-byte address, more memory cycles are required to 
access the operand than if the operand started on an 
even address. The extra memory cycles will lower system 
performance. 

The 86/2X will attempt to minimize the number of extra 
memory cycles required for odd-aligned operands. In 
these cases, the 8087 will perform first a byte operation, 
then a series of word operations, and finally a byte 
operation. 

88/2X instruction timings are independent of operand 
alignment, since byte operations are always performed. 
However, it is recommended to align numeric operands 
on even boundaries for maximum performance in case 
the program is transported to an 86/2X. 
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RQ/GT CONNECTION 

Two decisions must be made when connecting the 8087 
to a system. The first is how to interconnect the RQ/GT 
signals of all local bus masters. The RQ/GT decision af- 
fects the response time to service local bus requests from 
other local bus masters, such as an 8089 lOP or other 
coprocessor. The interrupt connection affects the 
response time to service an interrupt request and how 
user-interrupt handlers are written. The implications of 
how these pins are connected concern both the hardware 
designer and programmer and must be understood by 
both. 

The RQ/GT issue can be broken into three general cate- 
gories, depending on system configuration: 86/20 or 
88/20, 86/21 or 88/21, and 86/22 or 88/22. Remote 
operation of an lOP is not effected by the 8087 RQ/GT 
connection. 

iAPX 86/20, 88/20 

For an 86/20 or 88/20 just connect the RQ/GTO pin of 
the 8087 to RQ/GTl of the host (see Figure 5), and skip 
forward to the interrupt discussion on page 15. 

iAPX 86/21, 88/21 

For an 86/21 or 88/21, connect RQ/GTO of the 8087 to 
RQ/GTl of the host, connect RQ/GT of the 8089 to 
RQ/GTl of the 8087 (see Figure 10, page 12), and skip 
forward to the interrupt discussion on page 15. 

The RQ/GTl pin of the 8087 exists to provide one I/O 
processor with a low maximum wait time for the local 
bus. The maximum wait times to gain control of the 
local bus for a device attached to RQ/GTl of an 8087 
for an 8086 or 8088 host are shown in Table 2. These 
numbers are all dependent on when the host will release 
the local bus to the 8087. 


As Table 2 implies, three factors determine when the 
host will release the local bus: 

1) What type of host is there, an 8086 or 8088? 

2) What is the current instruction being executed? 

3) How is the lock prefix being used? 

An 8086 host will not release the local bus between the 
two consecutive byte operations performed for odd- 
aligned word operands. The 8088, in contrast, will never 
release the local bus between the two bytes of a word 
transfer, independent of its byte alignment. 

Host operations such as acknowledging an interrupt will 
not release the local bus for several bus cycles. 

Using a lock prefix in front of a host instruction 
prevents the host from releasing the local bus during the 
execution of that instruction. 

8087 RQ/GT Function 

The presence of the 8087 in the RQ/GT path from the 
lOP to the host has little effect on the maximum wait 
time seen by the lOP when requesting the local bus. The 
8087 adds two clocks of delay to the basic time required 
by the host. This low delay is achieved due to a preemp- 
tive protocol implemented by the 8087 on RQ/GTl. 

I'he 8087 always gives higher priority to a request for 
the local bus from a device attached to its RQ/GTl pin 
than to a request generated internally by the 8087. If the 
8087 currently owns the local bus and a request is made 
to its RQ/GTl pin, the 8087 will finish the current 
memory cycle and release the local bus to the requestor. 
If the request from the devices arrives when the 8087 
does not own the local bus, then the 8087 will pass the 
request on to the host via its RQ/GTO pin. 


Table 2. Worst Case Local Bus Request Wait Times in Clocks 


System 

Configuration 

No Locked 
Instructions 

Only Locked 
Exchange 

Other Locked 
Instructions 

iAPX 86/21 
even aligned words 

15i 

35, 

max (15j, *) 

iAPX 86/21 
odd aligned words 

15i 

432 

max (432, *) 

iAPX 88/21 

15i 

432 

max (432, *) 


Notes: 1. Add two clocks for each wait state inserted per bus cycle 
2. Add four clocks for each wait state inserted per bus cycle 
* Execution time of longest locked instruction 
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Figure 11. iAPX 86/22 System 
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iAPX 86/22, 88/22 

An 86/22 system offers two alternates regarding to 
which lOP to connect an I/O device. Each lOP will of- 
fer a different maximum delay time to servide an I/O re- 
quest. (See Fig. 11) 

The second 8089 (lOPA) must use the RQ/GTO pin of 
the host. With two lOPs the designer must decide which 
lOP services which I/O devices, determined by the max- 
imum wait time allowed between when an I/O device re- 
quests lOP service and the lOP can respond. The max- 
imum service delay times of the two lOPs can be very 
different. It makes little difference which of the two 
host RQ/GT pins are used. 

The different wait times are due to the non-preemptive 
nature of bus grants between the two host RQ/GT pins. 
No communication of a need to use the local bus is 
possible between lOPA and the 8087/IOPB combina- 
tion. Any request for the local bus by the lOPA must 
wait in the worst case for the host, 8087, and lOPB to 
finish their longest sequence of memory cycles. lOPB 
must wait in the worst case for the host and lOPA to 
finish their longest sequence of memory cycles. The 
8087 has little effect on the maximum wait time of 
lOPB. 

DELAY EFFECTS OF THE 8087 

The delay effects of the 8087 on lOPA can be signifi- 
cant. When executing special instructions (FSAVE, 
FNSAVE, FRSTOR), the 8087 can perform 50 or 96 
consecutive memory cycles with an 8086 or 8088 host, 
respectively. These instructions do not affect response 
time to local bus requests seen by an lOPB. 

If the 8087 is performing a series of memory cycles while 
executing these instructions, and lOPB requests the 
local bus, the 8087 will stop its current memory activity, 
then release the local bus to lOPB. 

The 8087 cannot release the bus to lOPA since it cannot 
know that lOPA wants to use the local bus, like it can 
for lOPB. 

REDUCING 8087 DELAY EFFECTS 

For 86/22 or 88/22 systems requiring lower maximum 
wait times for lOPA, it is possible to reduce the worst 
case bus usage of the 8087. If three 8087 instructions are 
never executed; namely FSAVE, FNSAVE, or 
FRSTOR, the maximum number of consecutive mem- 
ory cycles performed by the 8087 is 10 or 16 for an 8086 
or 8088 host respectively. The function of these instruc- 
tions can be emulated with other 8087 instructions. 

Appendix B shows an example of how these three in- 
structions can be emulated. This improvment does have 
a cost, in the increased execution time of 427 or 747 ad- 


ditional clocks for an 8086 or 8088 respectively, for the 
equivalent save and restore operations. These opera- 
tions appear in time-critical context-switching functions 
of an operating system or interrupt handler. This tech- 
nique has no affect on the maximum wait time seen by 
lOPB or wait time seen by lOPA due to lOPB. 

Which lOP to connect to which I/O device in an 86/22 
or 88/22 system will depend on how quickly an I/O re- 
quest by the device must be serviced by the lOP. This 
maximum time must be greater than the sum of the 
maximum delay of the lOP and the maximum wait time 
to gain control of the local bus by the lOP. 

If neither lOP offers a fast enough response time, con- 
sider remote operation of the lOP. 

8087 INT Connection 

The next decision in adding the 8087 to an 8086 or 8088 
system is where to attach the INT signal of the 8087. 
The INT pin of the 8087 provides an external indication 
of software-selected numeric errors. The numeric pro- 
gram will stop until something is done about the error. 
Deciding where to connect the INT signal can have im- 
portant consequences on other interrupt handlers. 

WHAT ARE NUMERIC ERRORS? 

A numeric error occurs in the NPX whenever an opera- 
tion is attempted with invalid operands or attempts to 
produce a result which cannot be represented. If an in- 
correct or questionable operation is attempted by a pro- 
gram, the NPX will always indicate the event. Examples 
of errors on the NPX are: 1/0, square root of - 1, and 
reading from an empty register. For a detailed descrip- 
tion of when the 8087 detects a numeric error, refer to 
the Numerics Supplement. (See Lit. Ref). 

WHAT TO DO ABOUT NUMERIC ERRORS 

Two possible courses of action are possible when a 
numeric error occurs. The NPX can itself handle the 
error, allowing numeric program execution to continue 
undisturbed, or software in the host can handle the 
error. To have the 8087 handle a numeric error, set its 
associated mask bit in the NPX control word. Each 
numeric error may be individually masked. 

The NPX has a default fixup action defined for all pos- 
sible numeric errors when they are masked. The default 
actions were carefully selected for their generality and 
safety. 

For example, the default fixup for the precision error is 
to round the result using the rounding rules currently in 
effect. If the invalid error is masked, the NPX will 
generate a special value called indefinite as the result of 
any invalid operation. 
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NUMERIC ERRORS (CON’T) 

Any arithmetic operation with an indefinite operand 
will always generate an indefinite result. In this manner, 
the result of the original invalid operation will pro- 
pagate throughout the program wherever it is used. 

When a questionable operation such as multiplying an 
unnormal value by a normal value occurs, the NPX will 
signal this occurrence by generating an unnormal result. 

The required response by host software to a numeric 
error will depend on the application. The needs of each 
application must be understood when deciding on how 
to treat numeric errors. There are three attitudes 
towards a numeric error: 

1) No response required. Let the NPX perform the 
default fixup. 

2) Stop everything, something terrible has happened! 

3) Oh, not again! But don’t disrupt doing something 
more important. 

SIMPLE ERROR HANDLING 

Some very simple applications may mask all of the 
numeric errors. In this simple case, the 8087 INT signal 
may be left unconnected since the 8087 will never assert 
this signal. If any numeric errors are detected during the 
course of executing the program, the NPX will generate 
a safe result. It is sufficient to test the final results of the 
calculation to see if they are valid. 

Special values like not-a-number (NAN), infinity, in- 
definite, denormals, and unnormals indicate the type 
and severity of earlier invalid or questionable opera- 
tions. 

SEVERE ERROR HANDLING 

For dedicated applications, programs should not gener- 
ate or use any invalid operands. Furthermore, all num- 
bers should be in range. An operand or result outside 
this range indicates a severe fault in the system. This 
situation may arise due to invalid input values, program 
error, or hardware faults. The integrity of the program 
and hardware is in question, and immediate action is re- 
quired. 

In this case, the INT signal can be used to interrupt the 
program currently running. Such an interrupt would be 
of high priority. The interrupt handier responsible for 
numeric errors might perform system integrity tests and 
then restart the system at a known, safe state. The 
handler would not normally return to the point of error. 

Unmasked numeric errors are very useful for testing 
programs. Correct use of synchronization, (Page 21), 
allows the programmer to find out exactly what 
operands, instruction, and memory values caused the 
error. Once testing has finished, an error then becomes 
much more serious. 


The 8086 Family Numerics Supplement recommends 
masking all errors except invalid. (See Lit. Ref.). In this 
case the NPX will safely handle such errors as 
underflow, overflow, or divide by zero. Only truly ques- 
tionable operations will disturb the numerics program 
execution. 

An example of how infinities and divide by zero can be 
harmless occurs when calculating the parallel resistance 
of several values with the standard formula (Figure 12). 
If R1 becomes zero, the circuit resistance becomes 0. 
With divide by zero and precision masked, the NPX will 
produce the correct result. 

NUMERIC EXCEPTION HANDLING 

For some applications, a numeric error may not indicate 
a severe problem. The numeric error can indicate that a 
hardware resource has been exhausted, and the software 
must provide more. These cases are called exceptions 
since they do not normally arise. 

Special host software will handle numeric error excep- 
tions when they infrequently occur. In these cases, 
num_eric exceptions are expected to be recoverable 
although not requiring immediate service by the host. In 
effect, these exceptions extend the functionality of the 
NDP. Examples of extensions are: normalized only 
arithmetic, extending the register stack to memory, or 
tracing special data values. 


Ri 


Equivalent resistance = 

Ri R 2 R 3 


Figure 12. infinity Arithmetic Example 
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HOST INTERRUPT OVERVIEW 

The host has only two possible interrupt inputs, a non- 
maskable interrupt (NMI) and a maskable interrupt 
(INTR). Attaching the 8087 INT pin to the NMI input is 
not recommended. The following problems arise: NMI 
cannot be masked, it is usually reserved for more impor- 
tant functions like sanity timers or loss of power signal, 
and Intel supplied software for the NDP will not sup- 
port NMI interrupts. The INTR input of the host allows 
interrupt masking in the CPU, using an Intel 8259A 
Programmable Interrupt Controller (PIC) to resolve 
multiple interrupts, and has Intel support. 

NUMERIC INTERRUPT CHARACTERISTICS 

Numeric error interrupts are different from regular in- 
struction error interrupts like divide by zero. Numeric 
interrupts from the 8087 can occur long after the 
ESCAPE instruction that started the failing operation. 
For example, after starting a numeric multiply opera- 
tion, the host may respond to an external interrupt and 
be in the process of servicing it when the 8087 detects an 
overflow error. In this case the interrupt is a result of 
some earlier, unrelated program. 

From the point of view of the currently executing inter- 
rupt handler, numeric interrupts can come from only 

gram. 


To explicitly disable numeric interrupts, it is recom- 
mended that numeric interrupts be disabled at the 8087. 
The code example of Figure 13 shows how to disable 
any pending numeric interrupts then reenable them at 
the end of the handler. This code example can be safely 
placed in any routine which must prevent numeric inter- 
rupts from occurring. Note that the ESCAPE instruc- 
tions act as NOPs if an 8087 is not present in the system. 
It is not recommended to use numeric mnemonics since 
they may be converted to emulator calls, which run 
comparatively slow, if the 8087 emulator used. 

Interrupt systems have specific functions like fast 
response to external events or periodic execution of 
system routines. Adding an 8087 interrupt should not 
effect these functions. Desirable goals of any 8087 inter- 
rupt configuration are: 

— Hide numeric interrupts from interrupt handlers that 
don’t use the 8087. Since they didn’t cause the 
numeric interrupt why should they be interrupted? 

— Avoid adding code to interrupt handlers that don’t 
use the 8087 to prevent interruption by the 8087. 

— Allow other higher priority interrupts to be serviced 
while executing a numeric exception handler. 

— Provide numeric exception handling for interrupt 

...... ono'7 

— Avoid deadlock as described in a later section 
(page 24) 


Disable any possible numeric interrupt from the 8087. This code is safe to place in any 
procedure. If an 8087 is not present, the ESCAPE instructions will act as nops. These 
instructions are not affected by the TEST pin of the host. Using the 8087 emulator will not 
convert these instructions into interrupts. A word variable, called control, is required to hold 
the 8087 control word. Control must not be changed until it is reloaded into the 8087. 


ESC 

15, control 

; (FNSTCW) Save current 8087 control word 

NOP 


; Delay while 8087 saves current control 

NOP 


; register value 

ESC 

28,cx 

; (FNDISI) Disable any 8087 interrupts 


Set lEM bit in 8087 control register 
The contents of cx is irrelevant 
Interrupts can now be enabled 


(Your Code Here) 

; Reenable any pending interrupts in the 8087. This instruction does not disturb any 8087 instruction 
; currently in progress since all it does is change the lEM bit in the control register. 

TEST control, 80H ; Look at I EM bit 

JNZ $-h 4 ; If IEM = 1 skip FNENI 

ESC 28,ax ; (FNENI) reenable 8087 interrupts 


Figure 13. Inhibit/Enable 8087 Interrupts 
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Recommended Interrupt Configurations 

Five categories cover most uses of the 8087 interrupt in 
fixed priority interrupt systems. For each category, an 
interrupt configuration is suggested based on the goals 
mentioned above. 

1. All errors on the 8087 are always masked. 
Numeric interrupts are not possible. Leave the 
8087 INT signal unconnected. 

2. The 8087 is the only interrupt in the system. Con- 
nect the 8087 INT signal directly to the host’s 
INTR input. (See Figure 14 on page 19). A bus 
driver supplies interrupt vector lOjg for com- 
patibility with Intel supplied software. 

3. The 8087 interrupt is a stop everything event. 
Choose a high priority interrupt input that will ter- 
minate all numerics related activity. This is a 
special case since the interrupt handler may never 
return to the point of interruption (i.e. reset the 
system and restart rather than attempt to continue 
operation). 

4. Numeric exceptions or numeric programming er- 
rors are expected and all interrupt handlers either 
don’t use the 8087 or only use it with all errors 
masked. Use the lowest priority interrupt input. 
The 8087 interrupt handler should allow further 
interrupts by higher priority events. The PIC’s 
priority system will automatically prevent the 8087 
from disturbing other interrupts without adding 
extra code to them. 


5. Case 4 holds except that interrupt handlers may 
also generate numeric interrupts. Connect the 8087 
INT signal to multiple interrupt inputs. One input 
would still be the lowest priority input as in case 4. 
Interrupt handlers that may generate a numeric in- 
terrupt will require another 8087 INT connection 
to the next highest priority interrupt. Normally the 
higher priority numeric interrupt inputs would be 
masked and the low priority numeric interrupt 
enabled. The higher priority interrupt input would 
be unmasked only when servicing an interrupt 
which requires 8087 exception handling. 

All of these configurations hide the 8087 from all inter- 
rupt handlers which do not use the 8087. Only those in- 
terrupt handlers that use the 8087 are required to per- 
form any special 8087 related interrupt control ac- 
tivities. 

A conflict can arise between the desired PIC interrupt 
input and the required interrupt vector of lOjg for com- 
patibility with Intel software for numeric interrupts. A 
simple solution is to use more than one interrupt vector 
for numeric interrupts, all pointing at the same 8087 in- 
terrupt handler. Design the numeric interrupt handler 
such that it need not know what the interrupt vector was 
(i.e. don’t use specific EOI commands). 

If an interrupt system uses rotating interrupt priorities, 
it will not matter which interrupt input is used. 
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Figure 14. iAPX 86/20 With Numerics Interrupt Only 
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GETTING STARTED IN SOFTWARE 

Now we are ready to run numeric programs. Developing 
numeric software will be a new experience to some pro- 
grammers. This section of the application note is aimed 
at describing the programming environment and pro- 
viding programming guidelines for the NPX. The term 
NPX is used to emphasize that no distinction is made 
between the 8087 component or an emulated 8087. 

Two major areas of numeric software can be identified: 
systems software and applications software. Products 
such as iRMX™ 86 provide system software as an off- 
the-shelf product. Some applications use specially 
developed systems software optimized to their needs. 

Whether the system software is specially tailored or 
common, they share issues such as using concurrency, 
maintaining synchronization between the host and 8087, 
and establishing programming conventions. Appli- 
cations software directly performs the functions of the 
application. All applications will be concerned with ini- 
tialization and general programming rules for the NPX. 
Systems software will be more concerned with context 
switching, use of the NPX by interrupt handlers, and 
numeric exception handlers. 


How to Initialize the NPX 

The first action required by the NPX is initialization. 
This places the NPX in a known state, unaffected by 
other activity performed earlier. This initialization is 
similar to that caused by the RESET signal of the 8087. 
All the error masks are set, all registers are tagged 
empty, the TOP field is set to 0, default rounding, pre- 
cision, and infinity controls are set. The 8087 emulator 
requires more initialization than the component. Before 
the emulator may be used, all its interrupt vectors must 
be set to point to the correct entry points within the 
emulator. 

To provide compatibility between the emulator and 
component in this special case, a call to an external pro- 
cedure should be used before the first numeric instruc- 
tion. In ASM86 the programmer must call the external 
function INIT87. (Fig. 15). For PLM86, the 
programmer must call the built-in function 
INIT$REAL$MATH$UNIT. PLM86 will call INIT87 
when executing the INIT$REAL$MATH$UNIT built- 
in function. 

The function supplied for INIT87 will be different, 
depending on whether the emulator library, called 
E8087.LIB, or component library, called 8087. LIB, 
were used at link time. INIT87 will execute either an 
FNINIT instruction for the 8087 or initialize the 8087 
emulator interrupt vectors, as appropriate. 


Concurrency Overview 

With the NPX initialized, the next step in writing a 
numeric program is learning about concurrent execution 
within the NDP. 

ono^ «ii 

15 tt 5 p^ciai icaiuxc ui uic ouo/, ciiiuwing u 

and the host to simultaneously execute different instruc- 
tions. The 8087 emulator does not provide concurrency 
since it is implemented by the host. 

The benefit of concurrency to an application is higher 
performance. All Intel high level languages automatic- 
ally provide for and manage concurrency in the NDP. 
However, in exchange for the added performance, the 
assembly language programmer must understand and 
manage some areas of concurrency. This section is for 
the assembly language programmer or well-informed, 
high level language programmer. 

Whether the 8087 emulator or component is used, care 
should be taken by the assembly language programmer 
to follow the rules described below regarding synchro- 
nization. Otherwise, the program may not function cor- 
rectly with current or future alternatives for implement- 
ing the NDP. 

Concurrency is possible in the NDP because both the 
host and 8087 have separate arithmetic and control 
units. The host and coprocessor automatically decide 
who will perform any single instruction. The existence 
of the 8087 as a separate unit is not normally apparent. 

Numeric instructions, which will be executed by the 
8087, are simply placed in line with the instructions for 
the host. Numeric instructions are executed in the same 
order as they are encountered by the host in its instruc- 
tion stream. Since operations performed by the 8087 
generally require more time than operations performed 
by the host, the host can execute several of its instruc- 
tions while the 8087 performs one numeric operation. 


IN PLM86: 


CALL INIT$REAL$MATH$UNIT; 

IN ASM86: 


EXTRN 

INIT87:FAR 

• 


• 


• 


• 


CALL 

INIT87 


Figure 15. 8087 Initialization 
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MANAGING CONCURRENCY 

Concurrent execution of the host and 8087 is easy to 
establish and maintain. The activities of numeric pro- 
grams can be split into two major areas: program con- 
trol and arithmetic. The program control part performs 
activities like deciding what functions to perform, calcu- 
lating addresses of numeric operands, and loop control. 
The arithmetic part simply performs the adds, sub- 
tracts, multiplies, and other operations on the numeric 
operands. The NPX and host are designed to handle 
these two parts separately and efficiently. 

Managing concurrency is necessary because the arithme- 
tic and control areas must converge to a well-defined 
state when starting another numeric operation. A well- 
defined state meems all previous arithmetic and control 
operations are complete and valid. 

Normally, the host waits for the 8087 to finish the cur- 
rent numeric operation before starting another. This 
waiting is called synchronization. 

Managing concurrent execution of the 8087 involves 
three types of synchronization: instruction, data, and 
error. Instruction and error synchronization are 
automatically provided by the compiler or assembler. 
Data synchronization must be provided by the assembly 
language progammer or compiler. 


Instruction Synchronization 

Instruction synchronization is required because the 8087 
can only perform one numeric operation at a time. Be- 
fore any numeric operation is started, the 8087 must 
have completed all activity from previous instructions. 

The WAIT instruction on the host lets it wait for the 
8087 to finish all numeric activity before starting an- 
other numeric instruction. The assembler automatically 
provides for instruction synchronization since a WAIT 
instruction is part of most numeric instructions. A 
WAIT instruction requires 1 byte code space and 2.5 
clocks average execution time overhead. 

Instruction synchronization as provided by the assem- 
bler or a compiler allows concurrent operation in the 
NDP. An execution time comparison of NDP concur- 
rency and non-concurrency is illustrated in Figure 16. 
The non-concurrent program places a WAIT instruction 
immediately after a multiply instruction ESCAPE in- 
struction. The 8087 must complete the multiply opera- 
tion before the host executes the MOV instruction on 
statement 2. In contrast, the concurrent example allows 
the host to calculate the effective address of the next 
operand while the 8087 performs the multiply. The ex- 
ecution time of the concurrent technique is the longest 
of the host’s execution time from line 2 to 5 and the ex- 

execution time of the non-concurrent example is the 
sum of the execution times of statements 1 to 5. 


; This code macro defines two instructions which do not allow any concurrency of execution with 
; the host. A register version and memory version of the instruction is shown. It Is assumed that the 
; 8087 is always Idle from the previous instruction. Allow space for emulator fixups. 

R233 Record RF6:2, Mid3:3, RF7:3 

CodeMacro NCMUL dst:T, src:F 

RNfix OOOB 

R233 (11 B, 001 B, src) 

RWfix 

EndM 

CodeMacro NCMUL memop:Mq 
RNfixM 100B, memop 
ModRM 001 B, memop 
RWfix 
EndM 


Statement 

Concurrent 

Non Concurrent 

1 

FMUL 

st(0), st(1) 

NCMUL 

st(0), st(1) 

2 

MOV 

ax, size A 

MOV 

ax, size A 

3 

MUL 

index 

MUL 

index 

4 

MOV 

bx, ax 

MOV 

bx, ax 

5 

FMUL 

A [bx] 

NCMUL 

A [bx] 


Figure 16. Concurrent Versus Non-Concurrent Program 
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Data Synchronization 

Managing concurrency requires synchronizing data ref- 
erences by the host and 8087. 

Figure 17 shows four possible cases of the host and 8087 
sharing a memory value. The second two cases require 
the FWAIT instruction shown for data synchronization. 
In the first two cases, the host will finish with the 
operand I before the 8087 can reference it. The 
coprocessor interface guarantees this. In the second two 
cases, the host must wait for the 8087 to finish with the 
memory operand before proceeding to reuse it. The 
FWAIT instruction in case 3 forces the host to wait for 
the 8087 to read I before changing it. In case 4, the 
FWAIT prevents the host from reading I before the 
8087 sets its value. 

Obviously, the programmer must recognize any form of 
the two cases shown above which require explicit data 
synchronization. Data synchronization is not a concern 
when the host and 8087 are using different memory 
operands during the course of one numeric instruction. 
Figure 16 shows such an example of the host performing 
activity unrelated to the current numeric instruction 
being executed by the 8087. Correct recognition of these 
cases by the programmer is the price to be paid for pro- 
viding concurrency at the assembly language level. 

Automatic Data Synchronization 

Two methods exist to avoid the need for manual recog- 
nition of when data synchronization is needed: use a 
high level language which will automatically establish 
concurrency and manage it, or sacrifice some perfor- 
mance for automatic data synchronization by the as- 
sembler. 

When a high level language is not adequate, the 
assembler can be changed to always place a WAIT in- 
struction after the ESCAPE instruction. Figure 18 
shows an example of how to change the ASM86 code 
macro for the FIST instruction to automatically place 
an FWAIT instruction after the ESCAPE instruction. 
The lack of any possible concurrent execution between 
the host and 8087 while the FIST instruction is executing 
is the price paid for automatic data synchronization. 

An explicit FWAIT instruction for data synchroniza- 
tion, can be eliminated by using a subsequent numeric 
instruction. After this subsequent instruction has 
started execution, all memory references in earlier 
numeric instructions are complete. Reaching the next 
host instruction after the synchronizing numeric instruc- 
tion indicates previous numeric operands in memory are 
available. 


The data synchronization purpose of any FWAIT or 
numeric instruction must be well documented. Other- 
wise, a change to the program at a later time may 
remove the synchronizing numeric instruction, causing 
program failure, as: 

FISTP I 

FMUL 

MOV AX, I ; I is safe to use 


Case 1: 

MOV 1, 1 
FILD 1 

Case 3: 

FILD 

FWAIT 

MOV 

1 

1,5 

Case 2; 

MOV AX, 
FISTP 1 

Case 4; 

1 FISTP 

irxAi A i-r 

rvvMi 1 

MOV 

1 

AX,I 

Figure 17. 

Data Exchange Example 


> 

; This is a code macro to redefine the FIST 


; instruction to prevent any concurrency 
; while the instruction runs. A wait 

; instruction is placed immediately after the 
; escape to ensure the store is done 

; before the program may continue. This 

; code macro will work with the 8087 

; emulator, automatically replacing the 
; wait escape with a nop. 

CodeMacro FIST memop; Mw 
RfixM 111B, memop 
ModRM 010B, memop 
RWfix 
EndM 


Figure 18. Non-Concurrent FIST Instruction 
Code Macro 
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DATA SYNCHRONIZATION RULES EXCEPTIONS 

There are five exceptions to the above rules for data syn- 
chronization. The 8087 automatically provides data syn- 
chronization for these cases. They are necessary to 
avoid deadlock (described on page 24). The instructions 
FSTSW/FNSTSW, FSTCW/FNSTCW, FLDCW, 
FRSTOR, and FLDENV do not require any waiting by 
the host before it may read or modify the referenced 
memory location. 

The 8087 provides the data synchronization by prevent- 
ing the host from gaining control of the local bus while 
these instructions execute. If the host cannot gain con- 
trol of the local bus, it cannot change a value before the 
8087 reads it, or read a value before the 8087 writes into 
it. 

The coprocessor interface guarantees that, when the 
host executes one of these instructions, the 8087 will 
immediately request the local bus from the host. This 
request is timed such that, when the host finishes the 
read operation identifying the memory operand, it will 
always grant the local bus to the 8087 before the host 
may use the local bus for a data reference while execut- 
ing a subsequent instruction. The 8087 will not release 
the local bus to the host until it has finished executing 
the numeric instruction. 


Error Synchronization 

Numeric errors can occur on almost any numeric in- 
struction at any time during its execution. Page 15 
describes how a numeric error may have many inter- 
pretations, depending on the application. Since the re- 
sponse to a numeric error will depend on the applica- 
tion, this section covers topics common to all uses of the 
NPX. We will review why error synchronization is need- 
ed and how it is provided. 

Concurrent execution of the host and 8087 requires syn- 
chronization for errors just like data references and 
numeric instructions. In fact, the synchronization re- 
quired for data and instructions automatically provides 
error synchronization. 

However, incorrect data or instruction synchronization 
may not cause a problem until a numeric error occurs. A 
further complication is that a programmer may not ex- 
pect his numeric program to cause numeric errors, but 
in some systems they may regularly happen. To better 
understand these points, let’s look at what can happen 
when the NPX detects an error. 


ERROR SYNCHRONIZATION FOR EXTENSIONS 

The NPX can provide a default fixup for all numeric 
errors. A program can mask each individual error type 
to indicate that the NPX should generate a safe, reason- 
able result. The default error fixup activity is simply 
treated as part of the instruction which caused the error. 
No external indication of the error will be given. A flag 
in the numeric status register will be set to indicate that 
an error was detected, but no information regarding 
where or when will be available. 

If the NPX performs its default action for all errors, 
then error synchronization is never exercised. But this is 
no reason to ignore error synchronization. 

Another alternative exists to the NPX default fixup of 
an error. If the default NPX response to numeric errors 
is not desired, the host can implement any form of re- 
covery desired for any numeric error detectable by the 
NPX. When a numeric error is unmasked, and the error 
occurs, the NPX will stop further execution of the 
numeric instruction. The 8087 will signal this event on 
the INT pin, while the 8087 emulator will cause inter- 
rupt lOjg to occur. The 8087 INT signal is normally con- 
nected to the host’s interrupt system. Refer to page 18 
for further discussion on wiring the 8087 INT pin. 

Interrupting the host is a request from the NPX for 
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that further numeric program execution under the arith- 
metic and programming rules of the NPX is unreason- 
able. Error synchronization serves to insure the NDP is 
in a well defined state after an unmasked numeric error 
occured. Without a well defined state, it is impossible to 
figure out why the error occured. 

Allowing a correct analysis of the error is the heart of 
error synchronization. 

NDP ERROR STATES 

If concurrent execution is allowed, the state of the host 
when it recognizes the interrupt is undefined. The host 
may have changed many of its internal registers and be 
executing a totally different program by the time it is in- 
terrupted. To handle this situation, the NPX has special 
registers updated at the start of each numeric instruction 
to describe the state of the numeric program when the 
failed instruction was attempted. (See Lit. Ref. p. iii) 

Besides programmer comfort, a well-defined state is im- 
portant for error recovery routines. They can change the 
arithmetic and programming rules of the 8087. These 
changes may redefine the default fixup from an error, 
change the appearance of the NPX to the programmer, 
or change how arithmetic is defined on the NPX. 
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EXTENSION EXAMPLES 

A change to an error response might be to automatically 
normalize all denormals loaded from memory. A 
change in appearance might be extending the register 
stack to memory to provide an “infinite” number of 
numeric registers. The arithmetic of the 8087 Call be 
changed to automatically extend the precision and range 
of variables when exceeded. All these functions can be 
implemented on the NPX via numeric errors and 
associated recovery routines in a manner transparent to 
the programmer. 

Without correct error synchronization, numeric 
subroutines will not work correctly in the above situa- 
tions. 

Incorrect Error Synchronization 

An example of how some instructions written without 
error synchronization will work initially, but fail when 
moved into a new environment is: 

FILD COUNT 

INC COUNT 

FSQRT 

Three instructions are shown to load an integer, calcu- 
late its square root, then increment the integer. The 
coprocessor interface of the 8087 and synchronous ex- 
ecution of the 8087 emulator will allow this program to 
execute correctly when no errors occur on the FILD in- 
struction. 

But, this situation changes if the numeric register stack 
is extended to memory on an 8087. To extend the NPX 
stack to memory, the invalid error is unmasked. A push 
to a full register or pop from an empty register will 
cause an invalid error. The recovery routine for the er- 
ror must recognize this situation, fixup the stack, then 
perform the original operation. 

The recovery routine will not work correctly in the ex- 
ample. The problem is that there is no guarantee that 
COUNT will not be incremented before the 8087 can in- 
terrupt the host. If COUNT is incremented before the 
interrupt, the recovery routine will load a value of 
COUNT one too large, probably causing the program to 
fail. 

Error Synchrenizstiers sne! \^A!Ts 

Error synchronization relies on the WAIT instructions 
required by instruction and data synchronization and 
the INT and BUSY signals of the 8087. When an un- 
masked error occurs in the 8087, it asserts the BUSY' 
and INT signals. The INT signal is to interrupt the host, 
while the BUSY signal prevents the host from destroy- 
ing the current numeric context. 


The BUSY signal will never go inactive during a numeric 
instruction which asserts INT. 

The WAIT instructions supplied for instruction syn- 
chronization prevent the host from starting another 
numeric instruction until the current error is serviced. In 
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synchronization prevent the host from prematurely 
reading a value not yet stored by the 8087, or over- 
writing a value not yet read by the 8087. 

The host has two responsibilities when handling 
numeric errors. 1.) It must not disturb the numeric con- 
text when an error is detected, and 2.) it must clear the 
numeric error and attempt recovery from the error. The 
recovery program invoked by the numeric error may 
resume program execution after proper fixup, display 
the state of the NDP for programmer action, or simply 
abort the program. In any case, the host must do 
something with the 8087. With the INT and BUSY 
signals active, the 8087 cannot perform any useful 
work. Special instructions exist for controlling the 8087 
when in this state. Later, an example is given of how to 
save the state of the NPX with an error pending. (See 
page 29) 


Deadlock 

An undesirable situation may result if the host cannot 
be interrupted by the 8087 when asserting INT. This sit- 
uation, called deadlock, occurs if the interrupt path 
from the 8087 to the host is broken. 

The 8087 BUSY signal prevents the host from executing 
further instructions (for instruction or data syn- 
chronization) while the 8087 waits for the host to service 
the exception. The host is waiting for the 8087 to finish 
the current numeric operation. Both the host and 8087 
are waiting on each other. This situation is stable unless 
the host is interrupted by some other event. 

Deadlock has varying affects on the NDP’s perfor- 
mance. If no other interrupts in the system are possible, 
the NDP will wait forever. If other interrupts can arise, 
then the NDP can perform other functions, but the af- 
fected numeric program will remain “frozen”. 

SOLVING DEADLOCK 

Finding the break in the interrupt path is simple. Look 
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interrupt enable in the host, explicitly masked interrupt 
request in the interrupt controller, implicitly masked in- 
terrupt request in the interrupt controller due to a higher 
priority interrupt in service, or other gate functions, 
usually in TTL, on the host interrupt signal. 
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DEADLOCK AVOIDANCE 

Application programmers should not be concerned with 
deadlock. Normally, applications programs run with 
unmasked numeric errors able to interrupt them. Dead- 
lock is not possible in this case. Traditionally, systems 
software or interrupt handlers may run with numeric in- 
terrupts disabled. Deadlock prevention lies in this do- 
main. The golden rule to abide by is: “Never wait on the 
8087 if an unmasked error is possible and the 8087 inter- 
rupt path may be broken.” 

Error Synchronization Summary 

In summary, error synchronization involves protecting 
the state of the 8087 after an exception. Although not all 
applications may initially require error synchronization, 
it is just good programming practice to follow the rules. 
The advantage of being a “good” numerics program- 
mer is generality of your program so it can work in 
other, more general environments. 

Summary 

Synchronization is the price for concurrency in the 
NDP. Intel high level language compilers will auto- 
matically provide concurrency and manage it with syn- 
chronization. The assembly language programmer can 
choose between using concurrency or not. Placing a 
WAii msrruciion immeoiateiy alter any numeric in- 
struction will prevent concurrency and avoid synchro- 
nization concerns. 

The rules given above are complete and allow concur- 
rency to be used to full advantage. 

Synchronization and the Emulator 

The above discussion on synchronization takes on 
special meaning with the 8087 emulator. The 8087 emu- 
lator does not allow any concurrency. All numeric 
operand memory references, error tests, and wait for 
instruction completion occur within the emulator. As a 
result, programs which do not provide proper instruc- 
tion, data, or error synchronization may work with the 
8087 emulator while failing on the component. 

Correct programs for the 8087 work correctly on the 
emulator. 

Special Control Instructions of the NPX 

The special control instructions of the NPX: FNINIT, 
FNSAVE, FNSTENV, FRSTOR, FLDENV, FLDCW, 
FNSTSW, FNSTCW, FNCLEX, FNENI, and FNDISI 
remove some of the synchronization requirements men- 
tioned earlier. They are discussed here since they repre- 
sent exceptions to the rules mentioned on page 21. 

The instructions FNINIT, FNSAVE, FNSTENV, 
FNSTSW, FNCLEX, FNENI, and FNDISI do not wait 


for the current numeric instruction to finish before they 
execute. Of these instructions, FNINIT, FNSTSW, 
FNCLEX, FNENI and FNDISI will produce different 
results, depending on when they are executed relative to 
the current numeric instruction. 

For example, FNCLEX will cause a different status 
value to result from a concurrent arithmetic operation, 
depending on whether is is executed before or after the 
error status bits are updated at the end of the arithmetic 
operation. The intended use of FNCLEX is to clear a 
known error status bit which has caused BUSY to be 
asserted, avoiding deadlock. 

FNSTSW will safely, without deadlock, report the busy 
and error status of the NPX independent of the NDP in- 
terrupt status. 

FNINIT, FNENI, and FNDISI are used to place the 
NPX into a known state independent of its current 
state. FNDISI will prevent an unmasked error from 
asserting BUSY without disturbing the current error 
status bits. Appendix A shows an example of using 
FNDISI. 

The instructions FNSAVE and FNSTENV provide spe- 
cial functions. They allow saving the state of the NPX in 
a single instruction when host interrupts are disabled. 

Several host and numeric instructions are necessary to 
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unknown. Appendix A and B show examples of saving 
the NPX state. As the Numerics Supplement explains, 
host interrupts must always be disabled when executing 
FNSAVE or FNSTENV. 

The seven instructions FSTSW/FNSTSW, FSTCW/ 
FNSTCW, FLDCW, FLDENV, and FRSTOR do not 
require explicit WAIT instructions for data synchro- 
nization. All of these instructions are used to interrogate 
or control the numeric context. 

Data synchronization for these instructions is 
automatically provided by the coprocessor interface. 
The 8087 will take exclusive control of the memory bus, 
preventing the host from interfering with the data values 
before the 8087 can read them. Eliminating the need for 
a WAIT instruction avoids potential deadlock pro- 
blems. 

The three load instructions FLDCW, FLDENV, and 
FRSTOR can unmask a numeric error, activating the 
8087 BUSY signal. Such an error was the result of a 
previous numeric instruction and is not related to any 
fault in the instruction. 

Data synchronization is automatically provided since 
the host’s interrupts are usually disabled in context swit- 
ching or interrupt handling, deadlock might result if the 
host executed a WAIT instruction with its interrupts 
disabled after these instructions. After the host inter- 
rupts are enabled, an interrupt will occur if an unmask- 
ed error was pending. 
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PROGRAMMING TECHNIQUES 

The NPX provides a stack-oriented register set with 
stack-oriented instructions for numeric operands. These 
registers and instructions are optimized for numeric 
programs. For many programmers, these are new re- 
sources with new programming options available. 

Using Numeric Registers and 
instructions 

The register and instruction set of the NDP is optimized 
for the needs of numeric and general purpose programs. 
The host CPU provides the instructions and data types 
needed for general purpose data processing, while the 
8087 provides the data types and instructions for 
numeric processing. 

The instructions and data types recognized by the 8087 
are different from the CPU because numeric program 
requirements are different from those of general pur- 
pose programs. Numeric programs have long arithmetic 
expressions where a few temporary values are used in a 
few statements. Within these statements, a single value 
may be referenced many times. Due to the time involved 
to transfer values between registers and memory, a 
significant speed optimization is possible by keeping 
numbers in the NPX register file. 

In contrast, a general data processor is more concerned 
with addressing data in simple expressions and testing 
the results. Temporary values, constant across several 
instructions, are not as common nor is the penalty as 
large for placing them in memory.As a result it is 
simpler for compilers and programmers to manage 
memory based values. 


NPX Register Usage 

The eight numeric registers in the NDP are stack ori- 
ented. All numeric registers are addressed relative to a 
value called the TOP pointer, defined in the NDP status 
register, A register address given in an instruction is ad- 
ded to the TOP value to form the internal absolute ad- 
dress. Relative addressing of numeric registers has ad- 
vantages analogous to those of relative addressing of 
memory operands. 

Two modes are available for addressing the numeric 
registers. The first mode implicitly uses the top and op- 
tional next element on the stack for operands. This 
mode does not require any addressing bits in a numeric 
instruction. Special purpose instructions use this mode 
since full addressing flexibility is not required. 

The other addressing mode allows any other stack ele- 
ment to be used together with the top of stack register. 
The top of stack or the other register may be specified as 
the destination. Most two-operand arithmetic instruc- 
tions allow this addressing mode. Short, easy to develop 
numeric programs are the result. 

Just as relative addressing of memory operands avoids 
concerns with memory allocation in other parts of a 
program, top relative register addressing allows registers 
to be used without regard for numeric register assign- 
ments in other parts of the program. 

STACK RELATIVE ADDRESSING EXAMPLE 

Consider an example of a main program calling a 
subroutine, each using register addressing independent 
of the other. (Fig. 19) By using different values of the 
TOP field, different software can use the same relative 
register addresses as other parts of the program, but 
refer to different physical registers. 


MAIN^„PROGRAM 



FLD 

A 


FADD 

ST, ST(1) 


CALL 

SUBROUTINE 

; Argument is in ST(0) 

FSTP 

B 


SUBROUTINE: 

FLD 

ST 

; ST(0) = ST(1) = Argument 

FSQRT 


; Main program ST(1) is 

FADD 

c 

; safe in ST(2) here 

FMULP 

RET 

ST(1), ST 



Figure 19. Stack Relative Addressing Example 
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Of course, there is a limit to any physical resource. The 
NDP has eight numeric registers. Normally, program- 
mers must ensure a maximum of eight values are pushed 
on the numeric register stack at any time. For time- 
critical inner loops of real-time applications, eight regis- 
ters should contain all the values needed. 

REGISTER STACK EXTENSION 

This hardware limitation can be hidden by software. 
Software can provide “virtual” numeric registers, ex- 
panding the register stack size to 6000 or more. 

The numeric register stack can be extended into memory 
via unmasked numeric invalid errors which cause an in- 
terrupt on stack overflow or underflow. The interrupt 
handler for the invalid error would manage a memory 
image of the numeric stack copying values into and out 
of memory as needed. 

The NPX will contain all the necessary information to 
identify the error, failing instruction, required registers, 
and destination register. After correcting for the missing 
hardware resource, the original numeric operation 
could be repeated. Either the original numeric instruc- 
tion could be single stepped or the affect of the instruc- 
tion emulated by a composite of table-based numeric in- 
structions executed by the error handler. 
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tion, the activity of the error handler will be transparent 
to programs. This type of extension to the NDP allows 
programs to push and pop numeric registers without 
regard for their usage by other subroutines. 

Programming Conventions 

With a better understanding of the stack registers, let’s 
consider some useful programming conventions. Fol- 
lowing these conventions ensures compatibility with 
Intel support software and high level language calling 
conventions. 

1) If the numeric registers are not extended to 
memory, the programmer must ensure that the 
number of temporary values left in the NPX stack 
and those registers used by the caller does not exceed 
8. Values can be stored to memory to provide enough 
free NPX registers. 

2) Pass the first seven numeric parameters to a subrou- 
tine in the numeric stack registers. Any extra param- 
eters can be passed on the host’s stack. Push the 
values on the register or memory stack in left to right 
order. If the subroutine does not need to allocate any 
more numeric registers, it can execute solely out of 
the numeric register stack. The eighth register can be 
used for arithmetic operations. All parameters 
should be popped off when the subroutine com- 
pletes. 


3) Return all numeric values on the numeric stack. The 
caller may now take advantage of the extended preci- 
sion and flexible store modes of the NDP. 

4) Finish all memory reads or writes by the NPX before 
exiting any subroutine. This guarantees correct data 
and error synchronization. A numeric operation 
based solely on register contents is safe to leave run- 
ning on subroutine exit. 

5) The operating mode of the NDP should be transpar- 
ent across any subroutine. The operating mode is 
defined by the control word of the NDP. If the sub- 
routine needs to use a different numeric operating 
mode than that of the caller, the subroutine should 
first save the current control word, set the new oper- 
ating mode, then restore the original control word 
when completed. 


PROGRAMMING EXAMPLES 

The last section of this application note will discuss five 
programming examples. These examples were picked to 
illustrate NDP programming techniaues and commonlv 
used functions. All have been coded, assembled, and 
tested. However, no guarantees are made regarding 
their correctness. 

The programming examples are: saving numeric 
context switching, save numeric context without 
FSAVE/FNSAVE, converting ASCII to floating point, 
converting floating point to ASCII, and trigonometric 
functions. Each example is listed in a different appendix 
with a detailed written description in the following text. 
The source code is available in machine readable form 
from the Intel Insite User’s Library, “Interactive 8087 
Instruction Interpreter,” catalog item AA20. 

The examples provide some basic functions needed to 
get started with the numeric data processor. They work 
with either the 8087 or the 8087 emulator with no source 
changes. 

The context switching examples are needed for 
operating systems or interrupt handlers which may use 
numeric instructions and operands. Converting between 
floating point and decimal ASCII will be needed to in- 
put or output numbers in easy to read form. The trigo- 
nometric examples help you get started with sine or 
cosine functions and can serve as a basis for optimiza- 
tions if the angle arguments always fall into a restricted 
range. 
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APPENDIX A 


OVERVIEW 

Appendix A shows deadlock-free examples of numeric 
context switching. Numeric context switching is re- 
quired by interrupt handlers which use the NPX and 
operating system context switchers. Context switching 
consists of two basic functions, save the numeric con- 
text and restore it. These functions must work indepen- 
dent of the current state of the NPX. 

Two versions of the context save function are shown. 
They use different versions of the save context instruc- 
tion. The FNSAVE/FSAVE instructions do all the work 
of saving the numeric context. The state of host inter- 
rupts will decide which instruction to use. 

Using FNSAVE 

The FNSAVE instruction is intended to save the NPX 
context when host interrupts are disabled. The host does 
not have to wait for the 8087 to finish its current opera- 
tion before starting this operation. Eliminating the in- 
struction synchronization wait avoids any potential 
deadlock. 

The 8087 Bus Interface Unit (BIU) will save this instruc- 
tion when encountered by the host and hold it until the 
8087 Floating point Execution Unit (FEU) finishes its 
current operation. When the FEU becomes idle, the 
BIU will start the FEU executing the save context opera- 
tion. 

The host can execute other non-numeric instructions 
after the FNSAVE while the BIU waits for the FEU to 
finish its current operation. The code starting at 
NO_INT_NPX_SAVE shows how to use the 
FNSAVE instruction. 

When executing the FNSAVE instruction, host inter- 
rupts must be disabled to avoid recursions of the in- 
struction. The 8087 BIU can hold only one FNSAVE in- 
struction at a time. If host interrupts were not disabled, 
another host interrupt might cause a second FNSAVE 
instruction to be executed, destroying the previous one 
saved in the 8087 BIU. 

It is not recommended to explicitly disable host inter- 
rupts just to execute an FNSAVE instruction. In 
general, such an operation may not be the best course of 
action or even be allowed. 

If host interrupts are enabled during the NPX context 
save function, it is recommended to use the FSAVE in- 
struction as shown by the code starting at NPX_SAVE. 
This example will always work, free of deadlock, in- 
dependent of the NDP interrupt state. 


Using FSAVE 

The FSAVE instruction performs the same operation as 
FNSAVE but it uses standard instruction synchroniza- 
tion. The host will wait for the FEU to be idle before 
initiating the save operation. Since the host ignores all 
interrupts between completing a WAIT instruction and 
starting the following ESCAPE instruction, the FEU is 
ready to immediately accept the operation (since it is not 
signalling BUSY). No recursion of the save context 
operation in the BIU is possible. However, deadlock 
must be considered since the host executes a WAIT in- 
struction. 

To avoid deadlock when using the FSAVE instruction, 
the 8087 must be prevented from signalling BUSY when 
an unmasked error exists. 

The Interrupt Enable Mask (lEM) bit in the NPX con- 
trol word provides this function. When lEM = 1 , the 
8087 will not signal BUSY or INT if an unmasked error 
exists. The NPX instruction FNDISI will set the lEM in- 
dependent of any pending errors without causing 
deadlock or any other errors. Using the FNDISI and 
FSAVE instructions together with a few other glue in- 
structions allows a general NPX context save function. 

Standard data and instruction synchronization is re- 
quired after executing the FNSAVE/FSAVE instruc- 
tion. The wait instruction following an FNSAVE/ 
FSAVE instruction is always safe since all NPX errors 
will be masked as part of the instruction execution. 
Deadlock is not possible since the 8087 will eventually 
signal not busy, allowing the host to continue on. 


PLACING THE SAVE CONTEXT FUNCTION 

Deciding on where to save the NPX context in an inter- 
rupt handler or context switcher is dependent on 
whether interrupts can be enabled inside the function. 
Since interrupt latency is measured in terms of the max- 
imum time interrupts are disabled, the maximum wait 
time of the host at the data synchronizing wait instruc- 
tion after the FNSAVE or the FSAVE instruction is im- 
portant if host interrupts are disabled while waiting. 

The wait time will be the maximum single instruction 
execution time of the 8087 plus the execution time of the 
save operation. This maximum time will be approxi- 
mately 1300 or 1500 clocks, depending on whether the 
host is an 8086 or 8088, respectively. The actual time 
will depend on how much concurrency of execution bet- 
ween the host and 8087 is provided. The greater the 
concurrency, the lesser the maximum wait time will be. 
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If host interrupts can be enabled during the context save 
function, it is recommended to use the FSAVE instruc- 
tion for saving the numeric context in the interruptable 
section. The FSAVE instruction allows instruction and 
data synchronizing waits to be interruptable. This 
technique removes the maximum execution time of 8087 
instructions from system interrupt latency time con- 
siderations. 

It is recommended to delay starting the numeric save 
function as long as possible to maintain the maximum 
amount of concurrent execution between the host and 
the 8087. 


Using FRSTOR 

Restoring the numeric context with FRSTOR does not 
require a data synchronizing wait afterwards since the 
8087 automatically prevents the host from interfering 
with the memory load operation. 

The code starting with NPX_RESTORE illustrates the 
restore operation. Error synchronization is not 
necessary since the FRSTOR instruction itself does not 
cause errors, but the previous state of the NPX may in- 
dicate an error. 

If further numeric instructions are executed after the 
FRSTOR, and the error state of the new NPX context is 
unknown, deadlock may occur if numeric exceptions 
cannot interrupt the host. 


NPX^save 


; General purpose save of NPX context. This function will work independent of the interrupt state of 

; the NDP. Deadlock can not occur. 47 words of memory are required by the variable save area. 

; Register ax is not transparent across this code. 

N PX_save: 


FNSTCW save__area 

Save I EM bit status 

NOP 

Delay while 8087 saves control register 

FNDISI 

Disable 8087 BUSY signal 

iviuv ax, save_area 

<jei original control word 

FSAVE save_area 

Save NPX context, the host can be safely interrupted while 


waiting for the 8087 to finish. Deadlock is not possible since 

FWAIT 

IEM = I.Waitfor save to finish. Put original control word into 

MOV save_area, ax 

NPX context area. All done 


no_int_N PX_save 


; Save the NPX context with host interrupts disabled. No deadlock is possible. 47 words of memory 
; are required by the variable save_area. 

n o_i n t_ N PX_save ; 

FNSAVE save_area ; Save NPX context. Wait for save to finish, no deadlock 

FWAIT ; is possible. Interrupts may be enabled now, all done 


NPX^restore 


; Restore the NPX context saved earlier. No deadlock is possible if no further numeric instructions 
; are executed until the 8087 numeric error interrupt is enabied. The variable save_area is assumed 
; to hold an NPX context saved earlier. It must be 47 words long. 

N PX_restore; 

FRSTOR save_area ; Load new NPX context 
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APPENDIX B 


OVERVIEW 

Appendix B shows alternative techniques for switching 
the numeric context without using the FSAVE/ 
FNSAVE or FRSTOR instructions. These alternative 
techniques are slower than those of Appendix A but 
they reduce the worst case continuous local bus usage of 
the 8087. 

Only an iAPX 86/22 or iAPX 88/22 could derive any 
benefit from this alternative. By replacing all 
FSAVE/FNSAVE instructions in the system, the worst 
case local bus usage of the 8087 will be 10 or 16 con- 
secutive memory cycles for an 8086 or 8088 host, respec- 
tively. 

Instead of saving and loading the entire numeric context 
in one long series of memory transfers, these routines 
use the FSTENV/FNSTENV/FLDENV instructions 
and separate numeric register load/store instructions. 
Using separate load/store instructions for the numeric 
registers forces the 8087 to release the local bus after 
each numeric load/store instruction. The longest series 
of back-to-back memory transfers required by these 
instructions are 8/12 memory cycles for an 8086 or 8088 
host, respectively. In contrast, the FSAVE/ 
FNSAVE/FRSTOR instructions perform 50/94 back- 
to-back memory cycles for an 8086 or 8088 host. 

Confipstibility With FSAVE/FNSAVE 

This function produces a context area of the same for- 
mat produced by FSAVE/FNSAVE instructions. Other 
software modules expecting such a format will not be 
affected. All the same interrupt and deadlock considera- 
tions of FSAVE and FNSAVE also apply to FSTENV 
and FNSTENV. Except for the fact that the numeric 
environment is 7 words rather than the 47 words of the 
numeric context, all the discussion of Appendix A also 
applies here. 


The state of the NPX registers must be saved in memory 
in the same format as the FSAVE/FNSAVE instruc- 
tions. The program example starting at the label 
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loop that will store their contents into memory in the 
same top relative order as that of FSAVE/FNSAVE. 


To save the registers with FSTP instructions, they must 
be tagged valid, zero, or special. This function will force 
all the registers to be tagged valid, independent of their 
contents or old tag, and then save them. No problems 
will arise if the tag value conflicts with the register’s 
content for the FSTP instruction. Saving empty regis- 
ters insures compatibility with the FSAVE/FNSAVE in- 
structions. After saving all the numeric registers, they 
will all be tagged empty, the same as if an 
FSAVE/FNSAVE instruction had been executed. 


Compatibility With FRSTOR 

Restoring the numeric context reverses the procedure 
described above, as shown by the code starting at 
SMALL_BLOCK_NPX_RESTORE. All eight regis- 
sters are reloaded in the reverse order. With each 
register load, a tag value will be assigned to each 
register. The tags assigned by the register load does not 
matter since the tag word will be overwritten when the 
environment is reloaded later with FLDENV. 

Two assumptions are required for correct operation of 
the restore function: all numeric registers must be empty 
and the TOP field must be the same as that in the con- 
text being restored. These assumptions will be satisfied 
if a matched set of pushes and pops were performed bet- 
ween saving the numeric context and reloading it. 

If these assumptions cannot be met, then the code exam- 
ple starting at NPX_CLEAN shows how to force all the 
NPX registers empty and set the TOP field of the status 
word. 
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small_block_N PX_save 



Save the NPX context independent of NDP interrupt state. Avoid using the FSAVE instruction to 


limit the worst case memory bus usage of the 8087. The NPX context area formed will appear the 


same as if an FSAVE instruction had written into it. The variable save_area will hold the NPX 


context and must be 47 words long. The registers ax, bx, and cx will not be transparent. 


small_block_ 

NPX_save: 


FNSTCW 

save_area 

Save current lEM bit 

NOP 


Delay while 8087 saves control register 

FNDISI 


Disable 8087 BUSY signal 

MOV 

ax, save_area 

Get original control word 

MOV 

cx, 8 

Set numeric register count 

XOR 

bx, bx 

Tag field value for stamping all registers as valid 

FSTENV 

save_area 

Save NPX environment 

FWAIT 


Wait for the store to complete 

XCHG 

save_area + 4, bx 

Get original tag value and set new tag value 

FLDENV 

save_area 

Force all register tags as valid. BUSY is still masked. No data 

MOV 

save_area, ax 

synchronization needed. Put original control word into NPX 

MOV 

save_area + 4, bx 

environment. Put original tag word into NPX environment 

XOR 

bx, bx 

Set initial register index 

reg_store_loop: 


FSTP 

saved_reg [bx] 

Save register 

ADD 

bx, type saved_reg 

Bump pointer to next register 

LOOP 

reg_store_loop 




All done 

NPX_clean 




Force the NPX into a clean state with TOP matching the TOP field stored in the NPX context and all 
numeric registers tagged empty. Save_area must be the NPX environment saved earlier. 
Temp_env is a 7 word temporary area used to build a prototype NPX environment. Register ax will 
not be transparent. 


NPX_clean: 


FINIT 


Put NPX into known state 

MOV 

ax, save_area + 2 

Get original status word 

AND 

ax, 3800H 

Mask out the top field 

FSTENV 

temp_env 

Format a temporary environment area with ail registers 
stamped empty and TOP field = 0. 

FWAIT 


Wait for the store to finish. 

OR 

temp_env + 2, ax 

Put in the desired TOP value. 

FLDENV 

temp_env 

Setup new NPX environment. 

Now enter small_block_NPX_restore 
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small_block_NPX_restore 


; Restore the NPX context without using the FRSTOR instruction. Assume the NPX context is in the 
; same form as that created by an FSAVE/FNSAVE instruction, all the registers are empty, and that 
; the TOP field of the NPX matches the TOP field of the NPX context. The variable save_area must 
; be an NPX context save area, 47 words long. The registers bx and cx will not be transparent. 

small_block_NPX_restore: 


MOV cx, 8 

; Set register count 

MOV bx, type saved_reg*7 

; Starting offset of ST(7) 

reg_load_loop: 


FLD saved_reg [bx] 

; Get the register 

SUB bx, type saved_reg 

LOOP reg_load_loop 

; Bump pointer to next register 

FLDENV save_area 

; Restore NPX context 
; All done 


APPENDIX C 


OVERVIEW 

Appendix C shows how floating point values can be 
converted to decimal ASCII character strings. The func- 
tion can be called from PLM/86, PASCAL/86, FOR- 
TRAN/86, or ASM/86 functions. 

Shortness, speed, and accuracy were chosen rather than 
providing the maximum number of significant digits 
possible. An attempt is made to keep integers in their 
own domain to avoid unnecessary conversion errors. 

Using the extended precision real number format, this 
routine achieves a worst case accuracy of three units in 
the 16th decimal position for a non-integer value or in- 
tegers greater than 10^*. This is double precision ac- 
curacy. With values having decimal exponents less than 
100 in magnitude, the accuracy is one unit in the 17th 
decimal position. 

Higher precision can be achieved with greater care in 
programming, larger program size, and lower perfor- 
mance. 

Function Partitioning 

Three separate modules implement the conversion. 
Most of the work of the conversion is done in the mod- 
ule FLOATING_TO_ASCII. The other modules are 
provided separately since they have a more general use. 
One of them, GET_POWER_10, is also used by the 
ASCII to floating point conversion routine. The other 
small module, TOS_STATUS, will identify what, if 
anything, is in the top of the numeric register stack. 


Exception Considerations 

Care is taken inside the function to avoid generating ex- 
ceptions. Any possible numeric value will be accepted. 
The only exceptions possible would occur if insufficient 
space exists on the numeric register stack. 

The value passed in the numeric stack is checked for ex- 
istence, type (NAN or infinity), and status (unnormal, 
denormal, zero, sign). The string size is tested for a 
minimum and maximum value. If the top of the register 
stack is empty, or the string size is too small, the func- 
tion will return with an error code. 

Overflow and underflow is avoided inside the function 
for very large or very small numbers. 

Special Instructions 

The functions demonstrate the operation of several 
numeric instructions, different data types, and precision 
control. Shown are instructions for automatic conver- 
sion to BCD, calculating the value of 10 raised to an in- 
teger value, establishing and maintaining concurrency, 
data synchronization, and use of directed rounding on 
the NPX. 

Without the extended precision data type and built-in 
exponential function, the double precision accuracy of 
this function could not be attained with the size and 
speed of the shown example. 

The function relies on the numeric BCD data type for 
conversion from binary floating point to decimal. It is 
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not difficult to unpack the BCD digits into separate 
ASCII decimal digits. The major work involves scaling 
the floating point value to the comparatively limited 
range of BCD values. To print a 9-digit result requires 
accurately scaling the given value to an integer between 
10® and 10^. For example, the number +0.123456789 
requires a scaling factor of 10^ to produce the value 
+ 123456789.0 which can be stored in 9 BCD digits. The 
scale factor must be an exact power of 10 to avoid to 
changing any of the printed digit values. 

These routines should exactly convert all values exactly 
representable in decimal in the field size given. Integer 
values which fit in the given string size, will not be 
scaled, but directly stored into the BCD form. Non- 
integer veilues exactly representable in decimal within 
the string size limits will also be exactly converted. For 
example, 0.125 is exactly representable in binary or 
decimal. To convert this floating point value to decimal, 
the scaling factor will be 1000, resulting in 125. When 
scaling a value, the function must keep track of where 
the decimal point lies in the final decimal value. 

DESCRIPTION OF OPERATION 

Converting a floating point number to decimal ASCII 
takes three major steps: identifying the magnitude of 
the number, scaling it for the BCD data type, and con- 
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Identifying the magnitude of the result requires finding 
the value X such that the number is represented by 
1*10^, where 1.0 <= I < 10.0. Scaling the number re- 
quires multiplying it by a scaling factor 10^, such that 
the result is an integer requiring no more decimal digits 
than provided for in the ASCII string. 

Once scaled, the numeric rounding modes and BCD 
conversion put the number in a form easy to convert to 
decimal ASCII by host software. 

Implementing each of these three steps requires atten- 
tion to detail. To begin with, not all floating point 
values have a numeric meaning. Values such as infinity, 
indefinite, or Not A Number (NAN) may be en- 
countered by the conversion routine. The conversion 
routine should recognize these values and identify them 
uniquely. 

Special cases of numeric values also exist. Denormals, 
unnormals, and pseudo zero all have a numeric value 
but should be recognized since all of them indicate that 
precision was lost during some earlier calculations. 

Once it has been determined that the number has a 
numeric value, and it is normalized setting appropriate 
unnormal flags, the value must be scaled to the BCD 
range. 


Scaling the Value 

To scale the number, its magnitude must be determined. 
It is sufficient to calculate the magnitude to an accuracy 
of 1 unit, or within a factor of 10 of the given value. 
After scaling the number, a check will be made to see if 
the result falls in the range expected. If not, the result 
can be adjusted one decimal order of magnitude up or 
down. The adjustment test after the scaling is necessary 
due to inevitable inaccuracies in the scaling value. 

Since the magnitude estimate need only be close, a fast 
technique is used. The magnitude is estimated by multi- 
plying the power of 2, the unbiased floating point expo- 
nent, associated with the number by logio2. Rounding 
the result to an integer will produce an estimate of suffi- 
cient accuracy. Ignoring the fraction value can in- 
troduce a maximum error of 0.32 in the result. 

Using the magnitude of the value and size of the number 
string, the scaling factor can be calculated. Calculating 
the scaling factor is the most inaccurate operation of the 
conversion process. The relation 10^ = 2**(X*log2l0) is 
used for this function. The exponentiate instruction 
(F2XM1) will be used. 

Due to restrictions on the range of values allowed by the 
F2XM1 instruction, the power of 2 value will be split in- 
to integer and fraction components. The relation 
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struction to recombine the 2**F value, calculated 
through F2XM1, and the 2**1 part. 

Inaccuracy in Scaling 

The inaccuracy of these operations arises because of the 
trailing zeroes placed into the fraction value when strip- 
ping off the integer valued bits. For each integer valued 
bit in the power of 2 value separated from the fraction 
bits, one bit of precision is lost in the fraction field due 
to the zero fill occurring in the least significant bits. 

Up to 14 bits may be lost in the fraction since the largest 
allowed floating point exponent value is 2^"^ - 1. 

AVOIDING UNDERFLOW AND OVERFLOW 

The fraction and exponent fields of the number are sep- 
arated to avoid underflow and overflow in calculating 
the scaling values. For example, to scale 10"'*^^^ to 10® 
requires a scaling factor of 10^950 ^hich cannot be rep- 
resented by the NPX. 

By separating the exponent and fraction, the scaling 
operation involves adding the exponents separate from 
multiplying the fractions. The exponent arithmetic will 
involve small integers, all easily represented by the 
NPX. 
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FINAL ADJUSTMENTS 

It is possible that the power function (Get Power 10) 

could produce a scaling value such that it forms a scaled 
result larger than the ASCII field could allow. 
For example, scaling 9.9999999999999999e4900 
by 1. 000000000000000 lOe-48 8 3 would produce 
1.0(XXXXXXXXXX)00(X)9el8. The scale factor is within the 
accuracy of the NDP and the result is within the conver- 
sion accuracy, but it cannot be represented in BCD for- 
mat. This is why there is a post-scaling test on the 
magnitude of the result. The result can be multiplied or 
divided by 10, depending on whether the result was too 
small or too large, respectively. 


Output Format 

For maximum flexibility in output formats, the position 
of the decimal point is indicated by a binary integer 
called the power value. If the power value is zero, then 
the decimal point is assumed to be at the right of the 
right-most digit. Power values greater than zero indicate 
how many trailing zeroes are not shown. For each unit 
below zero, move the decimal point to the left in the 
string. 

The last step of the conversion is storing the result in 
BCD and indicating where the decimal point lies. The 
BCD string is then unpacked into ASCII decimal char- 
acters. The ASCII sign is set corresponding to the sign 
of the original value. 


LINE SOURCE 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 


$ti tie (Convert a floating point number to ASCII) 
name floating_to ascii 
public f loating_to_asci i 
extrn get_powe r_10: near , to3_status : near 

; This subroutine will convert the floating point number in the 

; top of the 8087 stack to an ASCII string and separate power of 10 

; scaling value (in binary). The maximum width of the ASCII string 

; formed is controlled by a parameter which must be > 1. Unnormal values, 

; denormal values, and psuedo zeroes will be correctly converted. 

; A returned value will indicate how many binary bits of 

; precision were lost in an unnormal or denormal value. The magnitude 

; (in terms of binary power) of a psuedo zero will also be indicated. 

; Integers less than 10**18 in magnitude are accurately converted if the 

; destination ASCII string field is wide enough to hold all the 

; digits. Otherwise the value is converted to scientific notation. 

; The status of the conversion is identified by the return value, 

; it can be; 

; 0 conversion complete, string size is defined 

; 1 invalid arguments 

; 2 exact integer conversion, string size is defined 

; 3 indefinite 

; 4 + NAN (Not A Number) 

; 5 - NAN 

; 6 +Infinity 

; 7 - Infinity 

; 8 psuedo zero found, string^size is defined 

; The PLM/86 calling convention is: 

; floating to_ascii: 

; procedure (number ,denormal_ptr, string^ptr ,si ze^ptr , field_si ze , 

; power^tr) word external; *” ^ 

; declare (denormal _ptr ,string_ptr , power_ptr , s i ze_ptr) pointer; 

; declare field_size word, string size based size ptr word; 

• dsclsrs numbsr rs3l; 

; declare denormal integer based denormal ptr; 

; declare power integer based power ptr; ~ 

; end floating_to ascii; 

; The floating point value is expected to be on the top of the NPX 

; stack. This subroutine expects 3 free entries on the NPX stack and 

; will pop the passed value off when done. The generated ASCII string 

; will have a leading character either or '+' indicating the sign 

; of the value. The ASCII decimal digits will immediately follow. 

; The numeric value of the ASCII string is (ASCII STRING. ) *10**POWER. 
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49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
p') 


It the given number was zero, the ASCII string will contain a sign 
and a single zero chacter. The value string_size indicates the total 
length of the ASCII string including the sign character. String (0) will 
always hold the sign. It is possible for stringsize to be less than 
field^size. This occurs for zeroes or integer values. A psuedo zero 
will return a special return code. The denormal count will indicate 
the power of two originally associated with the value. The power of 
ten and ASCII string will be as if the value was an ordinary zero. 

This subroutine is accurate up to a maximum of 18 decimal digits for 
integers. Integer values will have a decimal power of zero associated 
with them. For non integers, the result will be accurate to within 2 
decimal digits of the 16th decimal place (double precision) . The 
exponentiate instruction is also used for scaling the value into the 
range acceptable for the BCD data type. The rounding mode in effect 
on entry to the subroutine is used for the conversion. 

The following registers are not transparent: 
ax bx cx dx si di flags 


Define the stack layout. 


bp_save 

equ 

word ptr fbpl 

es save 

equ 

bp save + size bp_save 

return ptr 

equ 

es_save + size es^save 

power_ptr 

equ 

return ptr + size~return_ptr 

f ield_si ze 

equ 

power_ptr + size power ptr 

size ptr 

equ 

field size + size fielS" size 

str ing_ptr 

equ 

size ptr + size size ptr" 

denormal__ptr 

equ 

string_ptr + size string ptr 


83 

parms size 

equ 

size power ptr + 

size field size + size size ptr + 

84 

& 


size string ptr + size denormal ptr 

85 

} 




86 

i 

Define constants used 


87 

88 

BCD DIGITS 

equ 

18 

Number of digits in bed value 

89 

WORD SIZE 

equ 

2 


90 

BCD ^IZE 

equ 

10 


91 

MINUS 

equ 

1 

Define return values 

92 

NAN 

equ 

4 

The exact values chosen here are 

93 

INFINITY 

equ 

6 

important. They must correspond to 

94 

INDEFINITE 

equ 

3 

the possible return values and be in 

95 

PSUEDO ZERO equ 

8 

the same numeric order as tested by 

96 

INVALID 

equ 

-2 

the program. 

97 

ZERO 

equ 

-4 


98 

DENORMAL 

equ 

-6 


99 

UNNORMAL 

equ 

-8 


100 

NORMAL 

equ 

0 


101 

EXACT 

equ 

2 


102 

103 

t 

9 

Define layout 

of temporary storage area. 

104 

105 

9 

status 

equ 

word ptr [bp-wORD__slZE] 

106 

power_two 

equ 

status - WORD SIZE 

107 

power^ten 

equ 

power two - WORD 

SIZE 

108 

bed value 

equ 

tbyte ptr power ten - BCD_slZE 

109 

bcd_byte 

equ 

byte ptr bed value 

110 

fraction 

equ 

bed value 


111 

112 

local size 

equ 

size status + size power two + size power ten 

113 

114 

115 

Sr 

9 

9 

+ size bed value 

Allocate stack space for the temporaries so the stack will be big enough 

116 

117 

9 

stack 

segment 

stack 'stack* 


118 


db 

(local size+6) dup (?) 


119 stack ends 
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120 


121 cgroup 


group 

code 

122 code 


segment 

public ‘code* 

123 



assume 

cs:cgroup 

124 



extrn 

power tabletqword 

125 





126 


Constants used 

by this function. 

127 





128 



even 

; Optimize for 16 bits 

129 constl0 


dw 

10 ; Adjustment value for too big BCD 

130 





131 


Convert the C3 

,C2,C1,C0 encoding from tos__status into meaningful bit 

132 


flags 

and values 

. 

133 





134 status 

table 

db 

UNNORMAL, NAN, UNNORMAL + MINUS, NAN + MINUS, 

135 & 



NORMAL, INFINITY, NORMAL + MINUS, INFINITY + MINUS, 

136 & 



ZERO, INVALID, ZERO + MINUS, INVALID, 

137 

St 



DENORMAL, INVALID, DENORMAL + MINUS, INVALID 


138 




139 

floating to ascii proc 


140 




141 

call 

tos__status 

; Look at status of ST(0) 

142 

mov 

bx ,ax 

; Get descriptor from table 

143 

mov 

al, status table[bx] 


144 

cmp 

a 1, INVALID 

; Look for empty ST(0) 

145 

jne 

not empty 


146 

; 



147 

; ST(0) 

is empty! Return the 

status value. 

148 

f 



149 

ret 

parms_size 


150 

f 



151 

; Remove infinity from stack 

and exit. 

152 

i 



153 

found infinity: 



154 




155 

f stp 

st(0) 

; OK to leave fstp running 

156 

jmp 

short exit proc 


157 

# 



158 

; String space is too small! 

Return invalid code. 

159 

f 



160 

small string: 



161 




162 

mov 

al, INVALID 


163 




164 

exi t_proc: 



165 




166 

mov 

sp,bp 

; Free stack space 

167 

pop 

bp 

; Restore registers 

168 

pop 

es 


169 

ret 

parms size 


170 




171 

; ST(0) 

is NAN or indefinite. 

Store the value in memory and look 

172 

; at the 

fraction field to separate indefinite from an ordinary NAN. 

173 

§ 



174 

NAN_or_indef ini 

te : 


175 




176 

f stp 

fraction 

; Remove value from stack for examination 

177 

test 

al, MINUS 

; Look at sign bit 

178 

fwa i t 


; Insure store is done 

179 

jz 

exi t_^proc 

; Can't be indefinite if positive 

180 
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181 

mov 

bx,0C000H 

Match against upper 16 bits of fractio 

182 

sub 

bx,word ptr fraction+6 

Compare bits 63-48 

183 

or 

bx,word ptr fraction+4 

Bits 32-47 must be zero 

184 

or 

bx,word ptr fraction+2 

Bits 31-16 must be zero 

185 

or 

bx,word ptr fraction 

Bits 15-0 must be zero 

186 

jnz 

exit proc 


187 



188 

mov 

al , INDEFINITE 

Set return value for indefinite value 

189 

jmp 

exi t_^proc 


190 




lyl 

Allocate stack space for local 

variables and establish parameter 

192 

address! 

bi 1 i ty . 


193 




194 not empty: 



195 




196 

push 

es 

Save working register 

197 

push 

bp 


198 

mov 

bp,sp 

Establish stack addressibil ity 

199 

sub 

sp, local size 


200 




201 

mov 

cx, field size 

Check for enough string space 

202 

cmp 

cx , 2 


203 

jl 

small string 


204 




205 

dec 

cx 

Adjust for sign character 

206 

cmp 

cx,BCD_DIGITS 

See if string is too large for BCD 

207 

jbe 

size ok 


208 




209 

mov 

cx,BCD_DIGITS 

Else set maximum string size 

210 



211 si ze ok : 



212 




213 

cmp 

al , INFINITY 

Look for infinity 

214 

jge 

found_inf inity 

Return status value for + or - inf. 

91 <; 




216 

cmp 

a 1 , NAN 

Look for NAN or INDEFINITE 

217 

jge 

NAN or_indef inite 


218 



219 

Set default return values and check that the number is normalized. 

220 




221 

fabs 


Use positive value only 

222 



sign bit in al has true sign of value 

223 

mov 

dx ,ax 

Save return value for later 

224 

xor 

ax ,ax 

Form 0 constant 

225 

mov 

di ,denormal__ptr 

Zero denormal count 

226 

mov 

word ptr [di 1 ,ax 


227 

mov 

bx, power ptr 

Zero power of ten value 

228 

mov 

word ptr [bx] ,ax 


229 

cmp 

dl,ZERO 

Test for zero 

230 

jae 

real zero 

Skip power code if value is zero 

231 



232 

cmp 

dl ,DENORMAL 

Look for a denormal value 

233 

jae 

found denormal 

Handle it specially 

234 



235 

f xt ract 


Separate exponent from significand 

236 

cmp 

dl ,UNNORMAL 

Test for unnormal value 

237 

jb 

normal value 


238 




239 

sub 

dl , UNNORMAL-NORMAL 

Return normal status with correct sign 

240 



241 

Normalize the fraction, adjust 

the power of two in ST(1) and set 

242 

the denormal count value. 


243 




244 

Assert : 

0 <= ST(0) < 1.0 


245 




246 

fldl 


Load constant to normalize fraction 

247 




248 normalize fraction: 


249 




250 

fadd 

St (1 ) , st 

Set integer bit in fraction 

251 

f sub 


Form normalized fraction in ST(0) 

252 

f xtract 


Power of two field will be negative 

253 



of denormal count 

254 

f xch 


Put denormal count in ST(0) 


37 




AP-113 


255 

256 

257 

258 

259 

260 
261 
262 

263 

264 

265 

266 

267 

268 

269 

270 

271 

272 

273 

274 

275 

276 

277 

278 

279 

280 
281 
282 

284 

285 

286 

287 

288 

289 

290 

291 

292 

293 

294 

295 

296 

297 

298 

299 

300 

301 

302 

303 

304 

305 

306 

307 

308 

309 

310 

311 

312 

313 

314 

315 
315 


fist word ptr [di] ; Put negative of denormal count in memory 

faddp St (2), St ; Form correct power of two in st(l) 

; OK to use word ptr [di] now 

neg word ptr [di] ; Form positive denormal count 

jnz not_psuedo_zero 

; A psuedo zero will appear as an unnormal number. When attempting 

; to normalize it, the resultant fraction field will be zero. Performing 

; an fxtract on zero will yield a zero exponent value. 

fxch ; Put power of two value in st(0) 

fistp word ptr [di] ; Set denormal count to power of two value 

; Word ptr [di] is not used by convert 
; integer, OK to leave running 

sub dl ,NORMAL-PSUEDO_ZERO ; Set return value saving the sign bit 

jmp convert_integer ; Put zero value into memory 

; The number is a real zero, set the return value and setup for 

; conversion to BCD. 

real_zero : 

sub dl , ZERO-NORMAL ; Convert status to normal value 

jmp convert_integer ; Treat the zero as an integer 

r 

; The number is a denormal. FXTRACT will not work correctly in this 

; case. To correctly separate the exponent and fraction, add a fixed 

; constant to the exponent to guarantee the result is not a denormal. 

found denormal: 


fldl ; Prepare to bump exponent 

fxch 

fprem ? Force denormal to smallest representable 

; extended real format exponent 

fxtract ; This will work correctly now 

The power of the original denormal value has been safely isolated. 

Check if the fraction value is an unnormal. 


fxam 

fstsw status 

fxch 

fxch st(2) 

sub dl , DENORMAL-NORMAL 

test status, 4400H 

jz normal i ze_fract ion 


See if the fraction is an unnormal 
Save status for later 
Put exponent in ST(0) 

Put 1.0 into ST(0), exponent in ST(2) 
Return normal status with correct sign 
See if C3=C2=0 impling unnormal or NAN 
Jump if fraction is an unnormal 


fstp st(0) 


Remove unnecessary 1.0 from st(0) 


Calculate the decimal magnitude associated with this number to 
within one order. This error will always be inevitable due to 
rounding and lost precision. As a result, we will deliberately fail 
to consider the LOG10 of the fraction value in calculating the order. 

Since the fraction will always be 1 <= F < 2, its LOG10 will not change 
the basic accuracy of the function. To get the decimal order of magnitude, 
simply multiply the power of two by LOG10(2) and truncate the result to 
an integer. 


normal_val ue : 
not_psuedo_zero; 


317 

fstp 

fraction 


Save the fraction field for later 

318 

fist 

power two 


Save power of two 

319 

fldlg2 



Get LOG10(2) 

320 




Power two is now safe to use 

321 

fmul 



Form LOG10(of exponent of number) 

322 

fistp 

power ten 


Any rounding mode will work here 

323 ; 





324 

Check 

if the magnitude of 

the 

lumber rules out treating it as 

325 

an integer. 



326 





327 ; 

CX has 

the maximum number of 

decimal digits allowed. 
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328 

329 

330 

331 

332 

333 

334 

335 

336 

337 

338 

339 

340 

341 

342 

343 

344 

345 

346 

347 

348 

349 

350 

351 

352 

353 

354 

355 

356 

357 

358 

359 

360 

tci 

362 

363 

364 

365 

366 

367 

368 

369 

370 

371 

372 

373 

374 

375 

376 

377 

378 

379 

380 

381 

382 

383 

384 

385 

386 

387 

388 

389 

390 

391 

392 

393 

394 

395 

396 

397 

398 

399 

400 

401 


fwa it 



Wait for power ten to be valid 

mov 

ax, power ten 


Get power of ten of value 

sub 

ax ,cx 


Form scaling factor necessary in ax 

ja 

ad just_result 


Jump if number will not fit 

The number is between 1 

and 10* 

* (field size) . 

Test if 

it is an integer. 



fild 

power two 


Restore original number 

mov 

si ,dx 


Save return value 

sub 

dl , NORMAL-EXACT 


Convert to exact return value 

fid 

f scale 

fraction 


Form full value, this is safe here 

f St 

st(l) 


Copy value for compare 

f rndint 



Test if its an integer 

f comp 



Compare values 

f StSW 

status 


Save status 

test 

status , 4000H 


C3=l implies it was an integer 

jnz 

convert integer 



f stp 

St (0) 

$ 

Remove non integer value 

mov 

dx ,si 

$ 

Restore original return value 

Scale 

the number to within the 

range allowed by the BCD format. 

The scaling operation should produce a number within one decimal order 
of magnitude of the largest decimal number representable within the 

given string width. 



The scaling power ot ten 

value 

is in ax. 

result: 





, ^ _ r 1 -.. 1 — 



neg 

ax 


Subtract one for each order of 
magnitude the value is scaled by 

call 

get power 10 


Scaling factor is returned as exponent 




and fraction 

fid 

fraction 


Get fraction 

fmul 



Combine fractions 

mov 

si ,cx 


Form power of ten of the maximum 

shl 

si , 1 


BCD value to fit in the string 

shl 

si , 1 


Index in si 

shl 

si , 1 



fild 

power two 


Combine powers of two 

f addp 
f scale 

St (2) ,st 


Form full value, exponent was safe 

f stp 

st(l) 

? 

Remove exponent 

Test 

the adjusted value 

against 

a table of exact powers of ten. 

The combined errors of the 

magnitude estimate and power function can 

result 

in a value one order of magnitude too small or too large to fit 

correctly in the BCD field 

. To handle this problem, pretest the 


adjusted value, if it is too small or large, then adjust it by ten and 
adjust the power ot ten value. 


test power: 


fcom 

power__table (si ] +type 

fstsw 

status 

test 

status , 4100H 

jnz 

test for_small 

f idiv 

const 10 

and 

dl,not EXACT 

inc 

word ptr [bx] 

jmp 

short in^^range 


test_for_small : 

fcom power_table [si ] 
fstsw statuF 


power_table; Compare against exact power 
; entry. Use the next entry since cx 
; has been decremented by one 
; No wait is necessary 
; If C3 = C0 = 0 then too big 


Else adjust value 

Remove exact flag 

Adjust power of ten value 

Convert the value to a BCD integer 


; Test relative size 
; No wait is necessary 
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402 

test 

status , 100H 


; If C0 = 0 then st(0) >= lower 

bound 

403 

jz 

in range 


; Convert the value to a BCD integer 

404 






405 

f itnul 

constl0 


; Adjust value into range 


A ati 
rt 1 / V/ 

dec 

f Vw 1 

i. VI ** t W'V J 


f j ^ wwi. * V- ^ 


407 






408 

n_range : 





409 






410 

f rndint 



; Form integer value 


411 






412 

Assert : 

0 <= TOS <= 999 

,999,999 

,999,999,999 


413 

The TOS 

number will be 

exactly 

representable in 18 digit BCD format. 

414 






415 C 

onvert_integer 

: 




416 






417 

fbstp 

bcd_yal ue 


; Store as BCD format number 


418 






419 

While 

the store BCD runs, setup registers for the conversion 

to 

420 

ASCII. 





421 






422 

mov 

si ,BCD_SIZE-2 


; Initial BCD index value 


423 

mov 

cx ,0f 04h 


; Set sh'ift count and mask 


424 

mov 

bx , 1 


; Set initial size of ASCII field for 

425 

mov 

di ,str ing_ptr 


; Get address of start of ASCII 

string 

426 

mov 

ax ,ds 


; Copy ds to es 


427 

mov 

es ,ax 




428 

cld 



; Set autoincrement mode 


429 

mov 

al, ’+• 


; Clear sign field 


430 

test 

dl, MINUS 


; Look for negative value 


431 

jz 

positi ve_result 




432 






433 

mov 

al,*-’ 




434 






435 

Positive result 

; 




436 






437 

stosb 



; Bump string pointer past sign 


438 

and 

dl,not MINUS 


; Turn off sign bit 


439 

fwa i t 



; Wait tor fbstp to finish 


440 

' 





441 

Register usage: 




442 



ah: 

BCD byte value in use 


443 



al : 

ASCII character value 


444 



dx: 

Return value 


445 



ch : 

BCD mask = 0fh 


446 



cl : 

BCD shift count = 4 


447 



bx: 

ASCII string field width 


448 



si : 

BCD field index 


449 



di : 

ASCII string field pointer 


450 



ds,es: 

ASCII string segment base 


451 






452 

Remove leading zeroes 

from the number. 


453 

‘ 





454 

skip leading zeroes: 




455 






456 

mov 

ah,bcd_byte [si] 


; Get BCD byte 


457 

mov 

al ,ah 


; Copy value 


458 

shr 

al ,cl 


; Get high order digit 


459 

and 

al ,ch 


; Set zero flag 


460 

jnz 

enter odd 


; Exit loop if leading non zero 

found 

461 






462 

mov 

al ,ah 


; Get BCD byte again 


463 

and 

al ,ch 


; Get low order digit 


464 

jnz 

enter_even 


; Exit loop if non zero digit found 

465 






466 

dec 

si 


; Decrement BCD index 


467 

jns 

skip leading zeroes 



468 

' 





469 

The significand was all zeroes. 


470 

* 





471 

mov 

al, *0 ' 


; Set initial zero 


472 

stosb 





473 

inc 

bx 


; Bump string length 


474 

jmp 

short exit^with 

value 




An 




AM 13 


475 




476 

; Now 

expand the BCD string 

into digit per byte values 0-9. 

477 




478 

digit loop: 



479 




480 

mov 

ah,bcd_byte [si] 

; Get BCD byte 

481 

mov 

al ,ah 


482 

shr 

al ,cl 

; Get high order digit 

483 




484 

enter_odd : 



485 




486 

add 

al, *0' 

; Convert to ASCII 

487 

stosb 


; Put digit into ASCII string 

488 

mov 

al ,ah 

1 Get low order digit 

489 

and 

al ,ch 


490 

inc 

bx 

; Bump field size counter 

491 




492 

enter_even: 



493 




494 

add 

al , *0 • 

; Convert to ASCII 

495 

stosb 


; Put digit into ASCII area 

496 

inc 

bx 

; Bump field size counter 

497 

dec 

si 

; Go to next BCD byte 

498 

jns 

digi t^loop 


499 

f 



500 

; Conversion complete. Set 

the string size and remainder. 

501 




502 

exit with value: 


503 




504 

mov 

di ,size ptr 


505 

mov 

word ptr [di ] ,bx 


506 

mov 

ax,dx 

; Set return value 

507 

jmp 

exit proc 


DkJb 




509 

floating to ascii endp 


510 

code 

ends 


511 


end 


ASSEMBLY 

COMPLETE, NO 

ERRORS FOUND 



LINE SOURCE 


1 

2 

3 

4 

5 

6 

7 

8 


Stitle (Calculate the value of 10**ax) 

; This subroutine will calculate the value of 10**ax. 

; All 8086 registers are transparent and the value is returned on 

; the TOS as two numbers, exponent in ST(1) and fraction in ST(0). 

; The exponent value can be larger than the maximum representable 

; exponent. Three stack entries are used. 


y 


name 

get power_10 

10 

11 


public 

get power' 10, power table 

12 

stack 

segment 

stack 'stack* 

13 


dw 

4 dup (?) 


14 

stack 

ends 

15 



16 

cgroup 

group code 

17 

code 

segment public 'code' 

18 


assume cs:cgroup 

19 

$ 


20 

; Use 

exact values from 1.0 to lel8. 

21 



22 


even 

23 

power table 

dq 1.0,lel,le2,le3 


Allocate space on the stack 


Optimize 16 bit access 
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24 


dq le4, ie5, le6, ie7 


25 


dq Ie8,le9, lel0, lell 


26 


dq Iel2,lel3,lel4,lel5 


27 dq lel6, lel7, leiS 


28 






29 

get 

power_10 

proc 


30 






31 



cmp 

ax, 18 

; Test for 0 <= ax < 19 

32 



ja 

out_of_range 


33 






34 



push 

bx 

; Get working index register 

35 



mov 

bx ,ax 

; Form table index 

36 



shl 

bx , 1 


37 



shl 

bx, 1 


38 



shl 

bx, 1 


39 



fid 

power table [bx] 

; Get exact value 

40 



pop 

bx 

; Restore register value 

41 



fxtract 


; Separate power and fraction 

42 



ret 


; OK to leave fxtract running 

43 






44 



Calculate the value using the 

exponentiate instruction. 

45 



The following relations are used 

: ■ 

46 




10**x = 2**(log2(10) *x) 


47 




2**(I+F) = 2**1 * 2**F 


48 




if st(l) = I and st(0) = 

2**F then fscale produces 2**(I+F) 

49 






50 

51 

52 

out 

jof_range: 





fldl2t 


; TOS = LOG2(10) 

53 



push 

bp 

; Establish stack addressibility 

54 



mov 

bp,sp 


55 



push 

ax 

; Put power (P) in memory 

56 



push 

ax 

; Allocate space for status 

57 



f imul 

word ptr [bp-2] 

; TOS,X = LOG2(10)*P = LOG2(10**P) 

58 



fnstcw 

word ptr [bp-4] 

; Get current control word 

59 





j Control word is a static value 

60 



mov 

ax, word ptr [bp-4] 

; Get control word, no wait necessary 

61 



and 

ax, not 0C00H 

r Mask off current rounding field 

62 



or 

ax , 0400H 

; Set round to negative infinity 

63 



xchg 

ax, word ptr [bp-4] 

; Put new control word in memory 

64 





; old control word is in ax 

65 



fldl 


; Set TOS = -1.0 

66 



fchs 



67 



fid 

st(l) 

; Copy power value in base two 

68 



f Idcw 

word ptr [bp-4] 

; Set new control word value 

69 



frndint 


; TOS = I: -inf < I <= X, I is an integer 

70 



mov 

word ptr [bp-4] ,ax 

; Restore original rounding control 

71 



f Idcw 

word ptr [bp-4] 
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72 

f xch 

st(2) 

73 

pop 

ax 

74 

f sub 

St , St 

75 

pop 

ax 

76 

f scale 


77 

f 2xml 


78 

pop 

bp 

79 

f subr 


80 

fmul 

St , St 

81 

ret 


82 



83 

get_power__10 

endp 

84 

code 

ends 

85 


end 


TOS = X, ST(1) = -1.0, ST(2) = I 

Remove original control word 

TOS,F = X-I: 0 <= TOS < 1.0 

Restore power of ten 

TOS = F/2: 0 <= TOS < 0.5 

TOS = 2**(F/2) - 1.0 

Restore stack 

Form 2** (F/2) 

Form 2**F 

OK to leave fmul running 


ASSEMBLY COMPLETE, NO ERRORS FOUND 


INE SOURCE 


1 Stitle (Determine TOS register 

contents) 


3 

This 

subroutine will 

return a value from 0-15 in 

ax corresponding 

4 

to the 

contents of 8087 TOS. All registers are transparent and no 

5 

errors 

are possible. 

The return value corresponds 

to c3,c2,cl,c0 

6 

7 

of FXAM 

instruction. 



/ 

8 

name 

tos status 



9 

public 

tos status 



10 





11 stack 

segment stack 

'stack ' 


12 


dw 3 dup 

(?) ; Allocate space on 

the stack 


13 

stack 


ends 




14 







15 

cgroup 


group code 




16 

code 


segment public 

code ' 

17 



assume csrcgroup 


18 

tos status 

proc 




19 







20 


f xam 




Get register contents status 

21 


push 

ax 



Allocate space for status value 

22 


push 

bp 



Establish stack addressibi 1 i ty 

23 


mov 

bp,sp 




24 


f stsw 

word ptr [bp+2] 



Put tos status in memory 

25 


pop 

bp 



Restore registers 

26 


pop 

ax 



Get status value, no wait necessaiy 

27 


mov 

al , ah 



Put bit 10-8 into bits 2-0 

28 


and 

ax , 4 007h 



Mask out bits c3,c2,cl,c0 

29 


shr 

ah,l 



Put bit c3 into bit 11 

30 


shr 

ah,l 




31 


shr 

ah,l 




32 


or 

a 1 , ah 



Put c3 into bit 3 

33 


mov 

ah , 0 



Clear return value 

34 


ret 





35 







36 

tos status 

endp 




37 

code 


ends 




38 



end 





ASSEMBLY COMPLETE, NO ERRORS FOUND 
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APPENDIX D 


OVERVIEW 

Appendix D shows a function for converting ASCII 
input strings into floating point values. The returned 
value can be used by PLM/86, PASCAL/86, FOR- 
TRAN/86, or ASM/86, The routine will accept a num- 
ber in ASCII of standard FORTRAN formats. Up to 18 
decimal digits are accepted and the conversion accuracy 
is the same as for converting in the other direction. 
Greater accuracy can also be achieved with similar 
tradeoffs, as mentioned earlier. 

Description of Operation 

Converting from ASCII to floating point is less complex 
numerically than going from floating point to ASCII. It 
consists of four basic steps: determine the size in deci- 
mal digits of the number, build a BCD value corre- 
sponding to the number string if the decimal point were 
at the far right, calculate the exponent value, and scale 
the BCD value. The first three steps are performed by 
the host software. The fourth step is mainly performed 
by numeric operations. 

The complexity in this function arises due to the flexible 
nature of the input values it will recognize. Most of the 


code simply determines the meaning of each character 
encountered. Two separate number inputs must be rec- 
ognized, mantissa and exponent values. Performing the 
numerics operations is very straightforward. 

The length of the number string is determined first to 
allow building a BCD number from low digits to high 
digits. This technique guarantees that an integer will be 
converted to its exact BCD integer equivalent. 

If the number is a floating point value, then the digit 
string can be scaled appropriately. If a decimal point oc- 
curs within the string, the scale factor must be decreased 
by one for each digit the decimal point is moved to the 
right. This factor must be added to any exponent value 
specified in the number, 

ACCURACY CONSIDERATIONS 

All the same considerations for converting floating 
point to ASCII apply to calculating the scaling factor. 
The accuracy of the scale factor determines the accuracy 
of the result. 

The exponents and fractions are again kept separate to 
prevent overflows or underflows during the scaling 
operations. 


LINE SOURCE 


1 

2 

3 

4 

5 

6 

7 

8 
y 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 


$title(ASCII to floating point conversion) 


Define the publicly known names. 

name asci i_to_f loating 

public asci i_to_floating 
extrn get_power_10:near 

This function will convert an ASCII character string to a floating 
point representation. Character strings in integer or scientific form 
will be accepted. The allowed format is: 

[+,-] [digit(s)] [.] tdigit(s)] [E,e] [+,-] [digit(s)] 

Where a digit must have been encountered before the exponent 
indicator 'E' or'e'. If a '+*, or was encountered, then at 

least one digit must exist before the optional exponent field. A value 
will always be returned in the 8087 stack. In case of invalid numbers, 
values like indefinite or infinity will be returned. 

The first character not fitting within the format will terminate the 
conversion. The address of the terminating character will be returned 
by this subroutine. 


The result will be left on the top of the NPX stack. This 
subroutine expects 3 free NPX stack registers. The sign of the result 

** -L ^ 1. ju uw any oxv^ij i. aw L.cr i. o xn uiie r%owxx xuo xwuiiuxis^ 

mode in effect at the time the subroutine was called will be used for 
the conversion from base 10 to base 2. Up to 18 significant decimal 
digits may appear in the number. Leadinrr zeroes, trail inr< zeroes, or 
exponent digits do not count towards the 18 digit maximum. Integers 
or exactly representable decimal numbers of 18 digits or less will be 
exactly converted. The technique used constructs a BCD number 
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34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 


representing the significant ASCII digits of the string with the decimal 
point removed. 

An attempt is made to exactly convert relatively small integers or 
small fractions. tor example the values: .06125, 123456789012345678, 
lel7, 1.23456e5, and 125e-3 will be exactly converted to floating point. 

The exponentiate instruction is used to scale the generated BCD vaslue 
to very large or very small numbers. The basic accuracy of this function 
determines the accuracy of this subroutine. For very large or very small 
numbers, the accuracy of this function is 2 units in the 16th decimal 
place or double precision. The range of decimal powers accepted is 
10**~4930 to 10**4930. 

The PLM/86 calling format is: 
asci i_to_f loating : 

procedure (string_ptr,end_ptr, status ptr) real external; 

declare (stringptr ,end_ptr ,status_ptr ) pointer; 

declare end basid end ptr pointer; 

declare status based statusptr word; 

end ; 

The status value has 6 possible states: 

0 A number was found. 

1 No number was found, return indefinite. 

2 Exponent was expected but none found, return indefinite. 

3 Too many digits were found, return indefinite. 

4 Exponent was too big, return a signed infinity. 

The following registers are used by this subroutine: 
ax bx cx dx si di 


68 

69 

70 

71 


95 


99 

100 

101 

102 

103 

104 

105 

106 
107 


Define constants. 


72 

LOW EXPONENT 

equ 

-4 930 

Smallest allowed power of 10 

73 

HIGH EXPONENT 

equ 

4930 

Largest allowed power of 10 

74 

WORD~bIZE 

equ 

2 


75 

BCD_bIZE 

equ 

10 


76 

f 




77 

; Define 

the 

parameter layouts involved: 


78 

f 




79 

bp_save 

equ 

word ptr [bp] 


80 

return ptr 

equ 

bp save + size bp save 


81 

status ptr 

equ 

return ptr + size return 

ptr 

82 

end ptr 

equ 

status_ptr + size status^ 

ptr 

83 

string ptr 

equ 

end ptF + size end ptr ~ 


84 





85 

parms size 

equ 

size status ptr + size end ptr + size string ptr 

86 

9 




87 

; Define 

the 

local variable data layouts 


88 

t 




89 

power_ten 

equ 

word ptr [bp- WORD_SIZE] 

; power of ten value 

90 

bed form 

equ 

tbyte ptr powerjten - BCD 

_SIZE; BCD representation 

91 





92 

local__si ze 

equ 

size powerjten + size bed 

_form 

93 

t 




94 

; Define 

common expressions used 



96 

bed 

byte 

equ 

byte 

ptr 

bed form 

; Current byte in the 

BCD form 

97 

bed 

count 

equ 

(type 

(bed 

form) -1 ) *2 

; Number of digits in 

BCD form 

98 

bed 

""sign 

equ 

byte 

ptr 

bed form + 9 

; Address of BCD sign 

byte 


bed sign bit 


equ 


80H 


Define return values. 

NUMBER_FOUND equ 0 
NOJJUMBER equ 1 
NO_EXPONENT equ 2 
TOO_MANY_DIGITS equ 3 
EXPONENT TOO BIG equ 4 


Number was found 
No number was found 
No exponent was found when expected 
Too many digits were found 
Exponent was too big 
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108 


109 

110 

9 

e 

Allocate stack 

space 

to insure enough exists at run time. 

111 

stack 

segment 

stack 

' stack ' 

112 


db 

(local 

size+4) dup (?) 


113 

114 

115 

116 

117 

118 

119 

120 
121 
122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 

156 

157 

158 

159 

160 
161 
162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 


stack 


ends 


cgroup group code 

code segment public 'code* 

assume cs:cgroup 

Define some of the possible return values. 


indefinite 

infinity 


even 

dd 

dd 


0FFC00000R 

07FF80000R 


ascii to floating proc 


fldz 

push 

mov 

sub 


bp 

bp,sp 

sp, local size 


Optimize 16 bit access 

Single precision real for indefinite 

Single precision real for +infinity 


Prepare to zero BCD value 
Save callers stack environment 
Establish stack addressibility 
Allocate space for local variables 


Get any leading sign character to form initial BCD template. 


mov 

xor 

cld 


si ,string_ptr 
dx ,dx 


Get starting address of the number 
Set initial decimal digit count 
Set autoincrement mode 


Register usage; 


al: Current character value being examined 

cx: Digit count before the decimal point 

dx: Total digit count 

si: Pointer to character string 

Look for an initial sign and skip it if found. 


lodsb 

cmp 

jz 

cmp 

jnz 

f chs 


al, ■+• 

scan leading_digi ts 

al,'-' 

enter leading digits 


; Get first character 
; Look for a sign 


; If not test current character 

: Set TOS = -0 


Count the number of digits appearing before an optional decimal point. 
scan_leading digits: 

lodsb ; Get next character 


enter__leading__digi ts : 

call test_digit ; Test for digit and bump counter 

jnc scan leading_digits 

/ 

: Look for a possible decimal point and start fbstp operation. 

; The fbstp zeroes out the BCD value and sets the correct sign. 

fbstp bcd_form ; Set initial sign and value of BCD number 

mov cx,dx ; Save count of digits before decimal point 

cmp al , ' . ' 

jnz test for digits 

i 

; Count the number of digits appearing after the decimal point. 

t 

scan_trail ingjdigi ts : 

lodsb ; Look at next character 
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180 

call 

test_digi 

181 

jnc 

scan trai 

182 


183 

There 

must be a 

184 



185 test for digits 

: 

186 



187 

dec 

si 

188 

or 

dx ,dx 

189 

jz 

no_number 

190 



191 

push 

si 

192 

dec 

si 

193 



194 

Check 

that the 


Test for digit and bump counter 


195 

CX becomes the 

196 

decimal 

point. 

197 



198 

sub 

cx ,dx 

199 



200 



201 

neg 

dx 

202 



203 



204 

cmp 

dx ,-bcd 

205 

jb 

test fo 

206 


207 

Setup 

initial 


Put si back on terminating character 

Test digit count 

Jump if no digits were found 

Save pointer to terminator 
Backup pointer to last digit 


CX becomes the initial scaling factor to account for the implied 


For each digit to the right of the 
decimal point, subtract one from the 
initial scaling power 
Use negative digit count so the 
test_digit routine can count dx up 
to zero 

See if too many digits found 


208 

209 


while building the BCD value in memory. 


210 

form_^bcd value: 




211 





212 

std 



Set autodecrement mode 

213 

mov 

power 

ten .cx 

Set initial power of ten 

^i4 

xor 

di ,di 


Clear BCD number index 

215 

mov 

cl , 4 


Set digit shift count 

216 

fwa i t 



Ensure BCD store is done 

217 

jmp 

enter 

_digit_loop 



218 

219 

220 
221 
222 

223 

224 

225 

226 

227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

239 

240 

241 

242 

243 

244 

245 

246 

247 

248 

249 

250 

251 

252 

253 


No digits were encountered before testing for the exponent. 
Restore the string pointer and return an indefinite value. 


no_number^found : 

mov 
fid 
jmp 


ax,NO_NUMBER 

indefinite 

exit 


; Set return status 

; Return an indefinite numeric value 


Test for a number of the form ???00000. 
test terminating point: 


lodsb 

cmp 

jz 

inc 

jmp 


al,'.* 

enter_power_zeroes 
si 

short enter__power^zeroes 


; Get last character 
; Look for decimal point 
; Skip forward if found 

; Else bump pointer back 


Too many decimal digits encountered. Attempt to remove leading and 
trailing digits to bring the total into the bounds of the BCD format. 


test_forjunneeded_digi ts: 


std 

or 

jz 

dec 


CX , cx 

test_terminati ng point 
dx 


Set autodecrement mode 
See if any digits appeared to the 
right of the decimal point 
Jump if none exist 

Adjust digit counter for loop 


Scan backwards from the right skipping trailing zeroes. 

If the end of the number is encountered, dx=0, the string consists of 
all zeroes! 
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254 

255 

256 

257 

258 

259 

260 
261 
262 

263 

264 

265 

266 

267 

268 

269 

270 

271 

272 

273 

274 

275 

276 

277 

278 

279 

280 
281 
282 

283 

284 

285 

286 

287 

288 

289 

290 

291 

292 

293 

294 

295 

296 

297 

298 

299 

300 

301 

302 

303 

304 

305 

306 

307 

308 

309 

310 

311 

312 

313 

314 

315 

316 

317 

318 

319 

320 

321 

322 

323 

324 

325 

326 

327 


skip_trailing zeroes: 


inc 

dx 

; Bump digit count 

T ’t 
J “ 

look for exponent 

I Jump if string of zeroes found! 

lodsb 


; Get next character 

inc 

cx 

; Bump power value for each trailing 

cmp 

al, *0' 

; zero dropped 

jz 

skip__trail ing zeroes 


dec 

cx 

; Adjust power counter from loop 

cmp 

al, • . • 

; Look for decimal point 

jnz 

scan^leading jzeroes 

; Skip forward if none found 

dec 

dx 

; Adjust counter for the decimal point 

; The 

string is of the form: ????. 0000000 

; See if 

any zeroes exist to the 

left of the decimal point. 

# 

enter_power_zeroes: 


dec 

dx 

; Adjust digit counter for loop 

skip powerjzeroes : 


inc 

dx 

; Bump digit count 

jz 

look for^exponent 


lodsb 


; Get next character 

inc 

cx 

; Bump power value for each trailing 

cmp 

al, *0' 

; zero dropped 

jz 

skip_power_^zeroes 


dec 

cx 

; Adjust power counter from loop 

9 

; Scan 

the leading digits from 

the left to see if they are zeroes. 

9 

scan leading zeroes: 


lea 

di,byte ptr [si+1] 

; Save new end of number pointer 

cld 


; Set autoincrement mode 

mov 

si ,str ing_ptr 

; Set pointer to the start 

lodsb 


; Look for sign character 

cmp 

al, •+' 


je 

skip leading zeroes 


cmp 

al,*-' 


jne 

enter leading zeroes 



Drop leading zeroes. None of them affect the power value in cx. 

We are guaranteed at least one non zero digit to terminate the loop. 


skip leading_^zeroes : 

lodsb ; Get next character 

enter leading _zeroes: 

inc dx ; Bump digit count 

cmp al,'0' ; Look for a zero 

jz skip_ieading_zeroes 

dec dx ; Adjust digit count from loop 

cmp al,'.' ; Look for 000.??? form 

jnz test_digit count 

; Number is of the form 000.???? 

; Drop all leading zeroes with no effect on the power value. 

r 

skipjniddle_zeroes : 

inc dx ; Remove the digit 

lodsb ; Get next character 
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328 

cmp 

al, *0' 



329 

jz 

skip middle zeroes 



330 





331 

dec 

dx 


Adjust digit count from loop 

332 

t 




333 

! All superflous zeroes are 

removed. Check if all is well now. 

334 

; 




335 

test digit count: 



336 





337 

cmp 

dx,-bcd count 



338 

jb 

too many digits found 


339 





340 

mov 

si ,di 

# 

Restore string pointer 

341 

jmp 

form__bcd^value 



342 





343 

toomanydigi ts_ 

found : 



344 





345 

fid 

indefinite 

f 

Set return numeric value 

346 

mov 

ax,TOO_MANY_DIGITS 

9 

Set return flag 

347 

pop 

si ~ 

9 

Get last address 

348 

jmp 

exit 



349 





350 

; Build 

BCD form of the dec 

imal ASCII string from right to left with 

351 

; trailing zeroes and decimal 

point 

removed. Note that the only non 

352 

; digit possible is a decimal 

point 

which can be safely ignored. 

353 

; Test digit will correctly count dx back towards zero to terminate 

354 

; the BCD 

build function. 



355 





356 

get_digit loop: 




357 





358 

lodsb 


9 

Get next character 

359 

call 

test__dig i t 

9 

Check if digit and bump digit count 

360 

jc 

get^Jigi t_loop 

9 

Skip the decimal point if found 

■)£ 1 





362 

shl 

al ,cl 

9 

Put digit into high nibble 

363 

or 

ah,al 

9 

Form BCD byte in ah 

364 

mov 

bed byte [di] ,ah 

9 

Put into BCD string 

365 

inc 

di ~ 

9 

Bump BCD pointer 

366 

or 

dx ,dx 

9 

Check if digit is available 

367 

jz 

look for^exponent 



368 





369 

enter digit looi 




370 





371 

1 odsb 


; 

Get next character 

372 

call 

test digit 

; 

Check if digit 

373 

jc 

enter digit loop 

9 

Skip the decimal point 

374 





375 

mov 

ah ,al 

9 

Save digit 

376 

or 

dx ,dx 

9 

Check if digit is available 

211 

jnz 

get digit loop 



378 





379 

mov 

bcd__byte Idi ] ,ah 

9 

Save last odd digit 

380 

§ 




381 

; Look for an exponent indicator. 


382 

t 




383 

look for exponent: 



384 





385 

pop 

si 


Restore string pointer 

386 

cld 



Set autoincrement direction 

387 

mov 

di ,power_ten 


Get current power of ten 

388 

lodsb 



Get next character 

389 

cmp 

al , • e ' 


Look for exponent indication 

390 

je 

exponent found 



391 





392 

cmp 

al, ’E' 



393 

jne 

convert 



394 

# 




395 

; An exponent is expected. 

get its numeric value. 

396 

f 




397 

exponent_^found : 




398 





399 

lodsb 



Get next character 

400 

xor 

di ,di 


Clear power variable 

401 

mov 

cx ,d i 


Clear exponent sign flag and digit 
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402 

cmp 

al , ’ 

9 

Test for positive sign 

403 

je 

skip_power_sign 



404 





405 

cmp 

al,'-' 

9 

Test for negative sign 

406 

jne 

enter power loop 



407 

» 




408 

; The 

exponent is negative. 



409 

i 




410 

inc 

ch 

9 

Set exponent sign flag 

411 





412 

skip power sign; 



413 

} 




414 

; Register usage; 



415 

t 




416 

9 

al; exponent character being examined 

417 

t 

bx; return value 



418 

$ 

ch; exponent sign flag 0 positive, 1 negative 

419 

9 

cl; digit flag 

0 

no digits found, 1 digits found 

420 

9 

dx; not usable since 

test digit increments it 

421 

9 

si; string pointer 



422 

9 

di; binary value of 

exponent 

423 

9 




424 

; Scan 

off exponent digits until 


a non-digit is encountered. 

425 

9 




426 

power loop; 




427 





428 

lodsb 


9 

Get next character 

429 





430 

enter power lo 

op: 



431 





432 

mov 

ah , 0 


Clear ah since ax is added to later 

433 

call 

test_digit 


Test tor a digit 

434 

jc 

f o rm_powe r__va 1 ue 


Exit loop if not 

435 





436 

mov 

cl,l 


Set power digit flag 

437 

sal 

di,l 


old*2 

438 

add 

ax,di 


old*2+digi t 

439 

sal 

di ,1 


old*4 

440 

sal 

di , 1 


old*8 

441 

add 

di ,ax 


old*10+digi t 

442 

cmp 

di,HIGH EXPONENT+bcd count; Check if exponent is too big 

443 

jna 

power loop 



444 

9 




445 

; The 

exponent is too large. 



446 

9 




447 

exponent overflow; 



448 





449 

mov 

ax , EXPONENT__TOO_BIG 


Set return value 

450 

fid 

infinity 


Return infinity 

451 

test 

bed sign, bed sign bit 


Return correctly signed infinity 

4 52 

jz 

exit ”” 


Jump if not 

453 





454 

f chs 


9 

Return -infinity 

455 

jmp 

short exit 



456 

9 




457 

; No exponent was found. 



458 

9 




459 

no exponent found; 



460 





461 

dec 

si 


Put si back on terminating character 

462 

mov 

ax,NO_EXPONENT 


Set return value 

463 

fid 

indefinite 


Set number to return 

464 

jmp 

short exit 



465 

9 




466 

; The 

string examination is complete. Form the correct power of ten. 

467 

9 




468 

form power value; 



469 





470 

dec 

si 


Backup string pointer to terminating 

471 




character 

472 

rcr 

ch , 1 


Test exponent sign flag 

473 

jnc 

positive exponent 



474 





475 

neg 

di 

9 

Force exponent negative 
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476 




477 

>ositive exponent: 


478 




479 

rcr 

cl,l 

; Test exponent digit flag 

480 

jnc 

no exponent__found 

; If zero then no exponent digits were 

481 



; found 

482 

add 

di, power ten 

; Form the final power of ten value 

483 

cmp 

di ,LOW_EXPONENT 

; Check if the value is in range 

484 

js 

exponent overflow 

; Jump if exponent is too small 

485 




486 

cmp 

di ,HIGH_^EXPONENT 


487 

jg 

exponenF overflow 


488 




489 

inc 

si 

; Adjust string pointer 

490 




491 

Convert the base 10 number to base 2. 

492 

Note : 

10**exp = 2** (exp* 

log2(10) ) 

493 




494 

di has 

binary power of ten 

value to scale the BCD value with. 

495 




496 

:onvert : 



497 




498 

dec 

si 

; Bump string pointer back to last cha 

499 

mov 

ax ,di 

; Set power of ten to calculate 

500 

or 

ax, ax 

; Test for positive or negative value 

501 

js 

get negative_power 


502 




503 

Scale 

the BCD value by a 

value >= 1. 

504 




505 

call 

get_^power 10 

; Get the adjustment power of ten 

506 

fbld 

bed “form 

; Get the digits to use 

507 

fmul 


; Form converged result 

508 

jmp 

short done 



coo 


510 

511 

512 

513 

514 

515 

516 

517 

518 

519 

520 

521 

522 

523 

524 

525 

526 

527 

528 

529 

530 

531 

532 

533 

534 

535 

536 

537 

538 

539 

540 

541 

542 

543 

544 

545 

546 

547 


; Calculate a power of ten value > 1 then divide the BCD value with 

; it. This technique is more exact than multiplying the BCD value by 

; a fraction since no negative power of ten can be exactly represented 

; in binary floating point. Using this technique will quarentee exact 

; conversion of values like .5 and .0625. 

} 

get_negative_power : 
neg ax 

call get_power_^10 

fbld bcd__form 

fdi vr 
f xch 
fchs 
f xch 

f 

; All done, set return values, 

done : 

fscale ; Update exponent of the result 

mov ax ,NUMBER__FOUND ; Set return value 

fstp st(l) ; Remove the scale factor 

exit: 

mov di, status ptr 

mov word ptr Tdi] ,ax 

mov di,end_ptr 

mov word ptr [di],si 

mov sp,bp 

pop bp 

fwait 

ret parms_size 

# 

; Test if the character in al is an ASCII digit. 

; If so then convert to binary, bump cx, and clear the carry flag. 

; Else leave as is and set the carry flag. 


; Set status of the conversion 

; Set ending string address 

; Deallocate local storage area 
; Restore caller's environment 
; Insure all loads from memory are done 


Force positive power 

Get the adjustment power of ten 

Get the digits to use 

Divide fractions 

Negate scale factor 
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548 

549 


563 

564 

565 

566 

567 

568 

569 

570 


test^digi t : 


550 

cmp 

al, '9' 

? See if a digit 

551 

ja 

not_digi t 


552 




553 

cmp 

al , * 0 * 


554 

jb 

not^digi t 


555 




556 

Character is a digit. 


557 




558 

inc 

dx 

; Bump digit count 

559 

sub 

al, '0' 

; Convert to binary 

560 

ret 



561 




562 

Character is not a digit. 



not_digi t: 
stc 
ret 

asci i_to_f loating endp 
code ends 

end 


; Leave as is and set the carry flag 


ASSEMBLY COMPLETE, NO ERRORS FOUND 


APPENDIX E 


OVERVIEW 

Appendix E contains three trigonometric functions for 
sine, cosine, and tangent. All accept a valid angle argu- 
ment between - 2^^ and + 2^^. They may be called from 
PLM/86, PASCAL/86, FORTRAN/86 or ASM/86 
functions. 

They use the partial tangent instruction together with 
trigonometric identities to calculate the result. They are 
accurate to within 16 units of the low 4 bits of an ex- 
tended precision value. The functions are coded for 
speed and small size, with tradeoffs available for greater 
accuracy. 

FPTAN and FPREM 

These trigonometric functions use the FPTAN instruc- 
tion of the NPX. FPTAN requires that the angle argu- 
ment be between 0 and PI/4 radians, 0 to 45 degrees. 
The FPREM instruction is used to reduce the argument 
down to this range. The low three quotient bits set by 
FPREM identify which octant the original angle was in. 

One FPREM instruction iteration can reduce angles of 
10^^ radians or less in magnitude to PI/4! Larger values 
can be reduced, but the meaning of the result is ques- 
tionable since any errors in the least significant bits of 
that value represent changes of 45 degrees or more in the 
reduced angle. 

Cosine Uses Sine Code 

To save code space, the cosine function uses most of the 
sine function code. The relation sin (|A| + PI/2) = 
cos(A) is used to convert the cosine argument into a sine 


argument. Adding PI/2 to the angle is performed by 
adding 01 02 to the FPREM quotient bits identifying the 
argument’s octant. 

It would be very inaccurate to add PI/2 to the cosine 
argument if it was very much different from PI/2. 

Depending on which octant the argument falls in, a dif- 
ferent relation will be used in the sine and tangent func- 
tions. The program listings show which relations are 
used. 

For the tangent function, the ratio produced by FPTAN 
will be directly evaluated. The sine function will use 
either a sine or cosine relation depending on which oc- 
tant the angle fell into. On exit these functions will nor- 
mally leave a divide instruction in progress to maintain 
concurrency. 

If the input angles are of a restricted range, such as from 
0 to 45 degrees, then considerable optimization is pos- 
sible since full angle reduction and octant identification 
is not necessary. 

All three functions begin by looking at the value given 
to them. Not a number (NAN), infinity, or empty regis- 
ters must be specially treated. Unnormals need to be 
converted to normal values before the FPTAN instruc- 
tion will work correctly. Denormals will be converted to 
very small unnormals which do work correctly for the 
FPTAN instruction. The sign of the angle is saved to 
control the sign of the result. 

Within the functions, close attention was paid to main- 
tain concurrent execution of the 8087 and host. The 
concurrent execution will effectively hide the execution 
time of the decision logic used in the program. 
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LINE SOURCE 


1 

2 

3 

4 

5 

6 +1 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 


$title(8087 Trignometric Functions) 

public sine , cosine , tangent 

name trig_f unctions 

$include ( : f 1 : 8087. anc) 


Define 8087 word packing in the environment area. 


cw 87 
& 

& 

& 

record 

res871; 3, inf ini ty^control : 1 , round ing_control :2, 
precision__controlT'2,error jenable: 1 , res872;l, 
precision__mask ; launder flow_mask : 1 ,overf lowjnask: 1 
zero^dividejnask : 1 ^denormaTjnask : 1 , invalid jnask: 1 

sw 87 
& ^ 

& 

& 

record 

busy: 1 ,cond3: 1 , top: 3,cond2: 1 , condl : 1 , cond0 : 1 , 
error_j>ending: 1 , res873: 1 ,preci sion_error : 1 , 
underf low^error : 1 ,overf low_^error : 1 ,zero_divide er 
denormal_error : 1, invalid^eFror : 1 

tw 87 
& ' 

record 

reg7_tag : 2, reg6_tag : 2, reg5^tag ; 2, reg4_tag: 2, 
reg3_tag: 2, reg2_tag: 2, regl'”tag: 2, reg0_tag : 2 

low ipj87 

record 

low ip:16 

high^i pjsp_87 

record 

hi^^^ip: 4 , res874 : 1 ^opcode 87:11 

low_op_87 

record 

low op:16 

high_op_87 

record 

hi_op:4 , res875: 12 

envi ronraent_87 

struc 

; 8087 environemnt layout 

envo / cw 

uw 

{ 

env87__sw 

dw 

"> 

env87~’tw 

dw 

7 

env87 low ip 

dw 

7 

env87^ip_op 

dw 

7 

env87 low op 

dw 

7 

env87~hop 
enviro^nment^8 7 

dw 

ends 

7 

t 

; Define 8087 related constants. 

TOP_VALUE_INC 

equ 

sw_87 <0,0,1,0,0,0,0,0,0,0,0,0,0,0> 

VALID TAG 

equ 

0 ; Tag register values 

ZERO TAG 

equ 

1 

SPECIAL TAG 

equ 

2 

EMPTY TAG 

equ 

3 

REGISTER MASK 

equ 

7 

0 

; Define local 

variable areas. 

0 

stack 

local^area 

segment 

struc 

stack 'stack' 

swl 

local_area 

dw 

ends 

db 

? ; 8087 status value 

size local_^area+4 ; Allocate stack space 

stack 

ends 


code 

segment 

public 'code' 


assume 

cs:code,ss:stack 

0 

; Define local 

constants . 

0 

status 

equ 

even 

[bp]. swl ; 8087 status value locat 

pi^quarter 

dt 

3FFEC90FDAA22168C235R ; PI/4 
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73 

74 

75 

76 

77 

78 

79 

80 
81 
82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 
100 

101 
1 n o 
J.IV ^ 

103 

104 

105 

106 

107 

108 

109 

110 
111 
112 

113 

114 

115 

116 

117 

118 

119 

120 
121 
122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 
155 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 


indefinite dd 0FFC00000R 


Indefinite special value 


This subroutine calculates the sine or cosine of the angle, given in 
radians. The angle is in ST(0), the returned value will be in ST(0). 
The result is accurate to within 7 units of the least significant three 
bits of the NPX extended real format. The PLM/86 definition is: 

sine: procedure (angle) real external; 

declare angle real; 
end sine; 

cosine: procedure (angle) real external; 
declare angle real; 
end cosine; 

Three stack registers are required. The result of the function is 
defined as follows for the following arguments: 

angle result 


valid or unnorraal less than 2**62 in magnitude 
zero 

denormal 

valid or unnormal greater than 2**62 

infinity 

NAN 

empty 


correct value 
0 or 1 

correct denormal 

indefinite 

indefinite 

NAN 

empty 


This function is based on the NPX fptan instruction. The fptan 
instruction will only work with an angle of from 0 to PI/4. With this 
instruction, the sine or cosine of angles from 0 to PI/4 can be accurately 
calculated. The technique used by this routine can calculate a general 
sine or cosine by using one of four possible operations: 

Let R = I angle mod PI/4| 

S = -1 or 1, according to the sign of the angle 

1) sin(R) 2) cos(R) 3) sin(PI/4-R) 4) cos(PI/4-R) 

The choice of the relation and the sign of the result follows the 
decision table shown below based on the octant the angle falls in: 


octant 

sine 

cosii 

0 

S*1 

2 

1 

S*4 

3 

2 

S*2 

-1*1 

3 

S*3 

-1*4 

4 

-S*l 

-1*2 

5 

-S*4 

-1*3 

6 

-S*2 

1 

7 

-S*3 

4 


; Angle to sine function is a zero or unnormal. 

f 

sine_zero^unnormal : 

fstp st(l) ; Remove PI/4 

jnz enter sine_normai i ze ; Jump if angle is unnormai 

; Angle is a zero. 

t 

pop bp ; Return the zero as the result 

ret 

# 

; Angle is an unnormal, 

enter sine normalize: 
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146 

call 

normalize value 




147 

jmp 

short enter sine 




148 






149 cosine proc 




Entry point to cosine 

150 






151 

fxam 




Look at the value 

152 

push 

bp 



Establish stack addressibility 

153 

sub 

sp,size local_area 



Allocate stack space for status 

154 

mov 

bp,sp 




155 

fstsw 

status 



Store status value 

156 

fid 

pi quarter 



Setup for angle reduce 

157 

mov 

cl,l 



Signal cosine function 

158 

pop 

ax 



Get status value 

159 

lahf 




ZF = C3, PF <= C2, CF = C0 

160 

jc 

f unnyjparameter 



Jump if parameter is 

161 





empty, NAN, or infinity 

162 






163 

Angle 

is unnormal, normal. 

zero , 

denormal . 

164 






165 

fxch 



t 

st(0) = angle, st(l) = PI/4 

166 

jpe 

enter_^sine 


0 

Jump if normal or denormal 

167 






168 

Angle 

is an unnormal or zero. 



169 






170 

f stp 

st(l) 


0 

Remove PI/4 

171 

jnz 

enter sine normalize 




172 





173 

Angle 

is a zero. cos(0) = 

1.0 



174 






175 

f stp 

st(0) 


0 

Remove 0 

176 

pop 

bp 


0 

Restore stack 

177 

fldl 



0 

Return 1 

178 

ret 





179 

' 






J.OI0 

181 

182 

183 

184 

185 


197 

198 

199 

200 
201 
202 

203 

204 

205 

206 

207 

208 

209 

210 
211 
212 

213 

214 

215 

216 

217 

218 
219 


Mxi worK IS aone as a sine runccion. oy adding vi//. to tne angie 
a cosine is converted to a sine. Of course the angle addition is not 
done to the argument but rather to the program logic control values. 


sine : 


Entry point for sine function 


186 

fxam 




Look at the parameter 

187 

push 

bp 



Establish stack addressibility 

188 

sub 

sp,size local 

_area 


Allocate local space 

189 

mov 

bp,sp 




190 

fstsw 

status 



Look at fxam status 

191 

fid 

pijquarter 



Get PI/4 value 

192 

pop 

ax 



Get fxam status 

193 

lahf 




CF = C0, PF = C2, ZF = C3 

194 

jc 

funny parameter 


Jump if empty, NAN, or infinity 

195 






196 

Angl 

e is unnormal. 

normal , 

zero, or denorraal. 


fxch 

mov cl,0 

jpo sine zero unnormal 


; ST(1) = PI/4, st(0) angle 
; Signal sine 

; Jump if zero or unnormal 


ST(0) is either a normal or denormal value. Both will work. 

Use the fprem instruction to accurately reduce the range of the given 
angle to within 0 and PI/4 in magnitude. If fprem cannot reduce the 
angle in one shot, the angle is too big to be meaningful, > 2**62 
radians. Any roundoff error in the calculation of the angle given 
could completely change the result of this function. It is safest to 
call this very rare case an error. 


enter sine: 


fprem 


mov 
f stsw 


sp,bp 

status 


Reduce angle 

Note that fprem will force a 
denormal to a very small unnorroal 
Fptan of a very small unnormal 
will be the same very small 
unnormal, which is correct. 
Allocate stack space for status 
Check re'^uction was corriplete 
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220 



Quotient in C0,C3,C1 

221 

pop 

bx 

Get fprem status 

222 

test 

bh, high (mask cond2) 

sin (2*N*PI+x) = sin(x) 

223 

jnz 

angle too big 


224 



^ If 

Set and test for which eighth of the revolution the 

226 

angle fell into. 


227 




228 

Assert : 

-PI/4 < st(0) < PI/4 


229 




230 

f abs 


Force the argument positive 

231 



condl bit in bx holds the sign 

232 

or 

cl ,cl 

Test for sine or cosine function 

233 

jz 

sine^select 

Jump if sine function 

234 




235 

This 

is a cosine function. Ignore the 

original sign of the angle 

236 

and add 

a quarter revolution to the octant id from the fprem instruction. 

237 

cos (A) 

= sin(A+PI/2) and cos(|A|) = cos(A) 

238 




239 

and 

ah, not high (mask condl) 

Turn off sign of argument 

240 

or 

bh,high(mask busy) 

Prepare to add 010 to C0,C3,C1 

241 



status value in ax 

242 



Set busy bit so carry out from 

243 

add 

bh, high (mask cond3) 

C3 will go into the carry flag 

244 

mov 

al , 0 

Extract carry flag 

245 

rcl 

al , 1 

Put carry flag in low bit 

246 

xor 

bh ,al 

Add carry to C0 not changing 

247 



Cl flag 

248 

’ 


249 

See if the argument should be reversed 

depending on the octant in 

250 

which the argument fell during fprem. 


251 




252 sine select: 



253 




254 

test 

bh, high (mask condl) 

Reverse angle if Cl = 1 

255 

jz 

no^sine reverse 


256 



257 

Angle 

was in octants 1,3, 5, 7. 


258 




259 

f sub 


Invert sense of rotation 

260 

jmp 

short do_sine_f ptan 

0 < arg <= PI/4 

261 




262 

Angle 

was in octants 0,2, 4, 6. 


263 

Test for a zero argument since fptan will not work if st{0) = 0 

264 




265 no sine reverse 

: 


266 




267 

f tst 


Test for zero angle 

268 

mov 

sp,bp 

Allocate stack space 

269 

f stsw 

status 

cond3 = 1 if st(0) = 0 

270 

f stp 

st(l) 

Remove PI/4 

271 

pop 

cx 

Get ftst status 

272 

test 

ch,high(mask cond3) 

If C3=l, argument is zero 

273 

jnz 

sine argument zero 


274 



275 

Assert : 

0 < st(0) <» PI/4 


276 




277 do sine fptan: 



278 




279 

fptan 


TAN ST(0) = ST(1)/ST(0) = Y/X 

280 




281 after sine fptan: 


282 




283 

pop 

bp 

Restore stack 

284 

test 

bh,high(mask cond3 + mask condl) 

Look at octant angle fell into 

285 

jpo 

X numerator 

Calculate cosine for octants 

286 



1,2, 5,6 

/ 




288 

Calculate the sine of the argument. 


289 

sin (A) 

= tan (A) /sqrt (1+tan (A) **2) if tan(A) = Y/X then 

290 

sin (A) 

= y/sqrt(X*X + Y*Y) 


291 




292 

fid 

St (1) 

Copy Y value 

293 

jmp 

short finish sine 

Put Y value in numerator 


56 




The top of the stack is either NAN, infinity, or empty 


294 

295 

296 

297 f unnyjparameter : 

298 


299 

f stp 

st(0) 


f 

Remove PI/4 

300 

jz 

returnjempty 


t 

Return empty if no parm 

301 






302 

jpo 

return NAN 


$ 

Jump if st(0) is NAN 

303 






304 

st{0) 

is infinity. Return an 

indefinite value. 

305 






306 

f prem 



9 

ST(l) can be anything 

307 






308 

'eturn NAN: 





309 

'eturn empty: 





310 






311 

pop 

bp 


f 

Restore stack 

312 

ret 



f 

Ok to leave fprem running 

313 






314 

Simulate fptan with st(0) = 0 




315 






316 

sine argument zero: 




317 






318 

fldl 



r 

Simulate tan(0) 

319 

jmp 

after ^ine_fptan 


t 

Return the zero value 

320 

' 





321 

The angle was too large. Remove the 

modulus and dividend from the 

322 

stack and return an indefinite 

result. 



323 






324 

ingle too big: 





325 






326 

fcompp 



# 

Pop two values from the stack 

327 

fid 

i ndpf i n i tp 


• 

Rptnrn indpfinitp 

328 

pop 

bp 


f 

Restore stack 

329 

fwa i t 



r 

Wait for load to finish 

330 

ret 





331 






332 

Calculate the cosine of the argument 

• 


333 

cos(A) 

= l/sqrt(l+tan(A)**2) 

if tan(A) 

= Y/X then 

334 

cos (A) 

= X/sqrt(X*X + Y*Y) 




335 






336 X numerator: 





337 






338 

fid 

St (0) 


f 

Copy X value 

339 

f xch 

st(2) 


# 

Put X in numerator 

340 






341 

Einish^sine: 





342 






343 

fmul 

St , St (0) 


9 

Form X*X + Y*Y 

344 

f xch 





345 

fmul 

St , St (0) 




346 

fadd 



9 

st(0) = X*X + Y*Y 

347 

f sqrt 



9 

st(0) = sqrt(X*X + Y*Y) 

348 






349 






350 

Form 

the sign of the result. 

The two conditions are the Cl flag from 

351 

FXAM in 

bh and the C0 flag from 

fprem 

in 

ah. 

352 






353 

and 

bh,high(mask cond0) 


9 

Look at the fprem C0 flag 

354 

and 

ah,high(mask condl) 


; 

Look at the fxam Cl flag 

355 

or 

bh ,ah 


9 

Even number of flags cancel 

356 

jpe 

posi ti ve_sine 


9 

Two negatives make a positive 

357 






358 

f chs 



9 

Force result negative 


359 

360 positive sine: 

361 

362 fdiv ; Form final result 

363 ret ; Ok to leave fdiv running 

364 

365 cosine endp 

366 
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367 

368 

369 

370 

371 


This function will calculate the tangent of an angle. 

The angle, in radians is passed in ST(0) , the tangent is returned 
in ST{0). The tangent is calculated to an accuracy of 4 units in the 
least three significant bits of an extended real format number. The 


O'? -5 
^ t 4L 


i.u £. ina u X a ; 


373 





374 

tangent : 

procedure (angle) real external; 



375 


declare angle real; 



376 


end tangent; 



377 





378 

Two 

stack registers are used. The result of the 

tangent function is 

379 

defined for the following cases: 



380 





381 


angle 


result 

382 





383 


valid or unnormal < 2**62 in magnitude 


correct value 

384 


0 


0 

385 


denormal 


correct denormal 

386 


valid or unnormal > 2**62 in magnitude 


indefinite 

387 


NAN 


NAN 

388 


infinity 


indefinite 

389 


empty 


empty 

390 





391 

The 

tangent instruction uses the fptan instruction. 

Four possible 

392 

relations are used: 



393 





394 

Let 

R = jangle MOD PI/4| 



395 


S = -1 or 1 depending on the sign of the angle 


396 





397 

1) tan(R) 2) tan(PI/4-R) 3) l/tan(R) 4) 

l/tan(PI/4-R) 

398 





399 

The 

following table is used to decide which relation 

to use depending 

400 

on in 

which octant the angle fell. 



401 




402 

octant 

relation 



403 





404 

0 

S*1 



405 

1 

S*4 



406 

2 

-S*3 



407 

3 

-S*2 



408 

4 

S*1 



409 

5 

S*4 



410 

6 

-S*3 



411 

7 

-S*2 



412 





413 tangent proc 




414 





415 

f xam 

; Look at 

the 

parameter 

416 

push 

bp ; Establish stack addressibil ity 

417 

sub 

sp,size local area ; Allocate 

local variable space 

418 

mov 

bp,sp 



419 

f stsw 

status ; Get fxam 

status 

420 

fid 

pi_quarter ; Get PI/4 



421 

pop 

ax 



422 

lahf 

; CF = C0, 

PF 

= C2, ZF = C3 

423 

jc 

funny^parameter 



424 

' 




425 

Angle is unnormal, normal, zero, or denormal. 



426 

’ 




427 

f xch 

; st(0) = 

angle, st(l) = PI/4 

428 

jpe 

tan zero unnormal 



429 

' 




430 

Angle is either an normal or denormal. 



431 

Reduce 

the angle to the range -PI/4 < result < PI/4. 


432 

If fprem cannot perform this operation in one try. 

the 

magnitude of the 

433 

angle 

must be > 2**62. Such an angle is so large 

that 

any rounding 

434 

errors could make a very large difference in the r 

educ 

sd angle. 

435 

It is 

safest to call this very rare case an error. 



436 

' 




437 tan normal: 




438 





439 

fprem 

; Quotient 

in 

C0,C3,C1 

440 


; Convert 

denorraals into unnorma 


Is 
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441 


mov sp,bp 

; Allocate stack spce 

442 


fstsw status 

; Quotient identifies octant 

443 



; original angle fell into 

444 


pop bx 

; tan(PI*N+x) = tan(x) 

445 


test bh, high (mask cond2) 

; Test for complete reduction 

446 


jnz angle_too_big 

; Exit if angle was too big 

447 




448 


See if the angle must be reversed. 

449 




450 


Assert: -PI/4 < st(0) < PI/4 


451 




452 


fabs 

; 0 <= st(0) < PI/4 

453 



; Cl in bx has the sign flag 

454 


test bh, high (mask condl) 

; must be reversed 

455 


jz no_tan__r averse 


456 




457 


Angle fell in octants 1,3, 5, 7. 

Reverse it, subtract it from PI/4, 

458 




459 


f sub 

; Reverse angle 

460 


jmp short do^tangent 


461 




462 


Angle is either zero or an unnormal. 

463 




464 

tan zero unnormal: 


465 




466 


fstp st(l) 

; Remove PI/4 

467 


jz tan_angle zero 


468 




469 


Angle is an unnormal. 


470 




471 


call normalize value 


472 


jmp tan normaF 


473 




474 

tan angle zero: 


** / 




476 


pop bp 

; Restore stack 

477 


ret 


478 




479 


Angle fell in octants 0,2, 4, 6. 

Test for st(0) = 0, fptan won't work. 

480 




481 

no tan reverse: 


482 




483 


f tst 

; Test for zero angle 

484 


mov sp,bp 

; Allocate stack space 

485 


fstsw status 

; C3 = 1 if st(0) = 0 

486 


fstp st(l) 

; Remove PI/4 

487 


pop cx 

; Get ftst status 

488 


test ch,high(mask cond3) 


489 


jnz tan zero 


490 




491 

do tangent: 


492 




493 


f ptan 

; tan ST(0) = ST(1)/ST(0) 

494 




495 

after tangent: 


496 


' 


497 


Decide on the order of the operands and their sign for the divide 

498 


operation while the fptan instruction is working. 

499 


' 


500 


pop bp 

; Restore stack 

501 


mov al,bh 

; Get a copy of fprem C3 flag 

502 


and ax, mask condl + high(mask 

cond3); Examine fprem C3 flag and 

503 



; fxtract Cl flag 

504 


test bh,high(mask condl + mask 

cond3); Use reverse divide if in 

505 



; octants 1 , 2 , 5 , 6 

506 


jpo reverse divide 

; Note! parity works on low 

507 



; 8 bits only! 

508 


’ 


509 


Angle was in octants 8, 3, 4, 7. 


510 


Test for the sign of the result. 

Two negatives cancel. 

511 


' 


512 


or a 1 , a h 


513 


jpe positive^divide 
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514 




515 

fchs 


Force result negative 

516 




517 positive divide 

: 


518 




519 

f d i y 


Form result 

520 

ret 


Ok to leave fdiv running 

521 



522 tan zero: 



523 




524 

fldl 


Force 1/0 = tan(PI/2) 

525 

jmp 

after tangent 


526 




527 

Angle 

was in octants 1,2, 5, 6, 


528 

Set the 

correct sign of the result. 


529 




530 reverse divide: 



531 




532 

or 

al , ah 


533 

jpe 

posit ive^r divide 


534 




535 

fchs 


Force result negative 

536 




537 positive r divide: 


538 




539 

fdivr 


Form reciprocal of result 

540 

ret 


Ok to leave fdiv running 

541 



542 tangent endp 



543 




544 

This 

function will normalize the value 

in St (0) . 

545 

Then PI/4 is placed into st(l). 


546 




547 normalize value 

: 


548 




549 

f abs 


Force value positive 

550 

fxtract 


0 <= st(0) < 1 

551 

fldl 


Get normalize bit 

552 

fadd 

St (1) ,st 

Normalize fraction 

553 

f sub 


Restore original value 

554 

f scale 


Form original normalized 

555 

f stp 

St(l) 

Remove scale factor 

556 

fid 

pijquarter 

Get PI/4 

557 

f xch 



558 

ret 



559 




560 code ends 



561 

end 



ASSEMBLY 

COMPLETE, NO ERRORS FOUND 



60 
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